1Surrounding Text Support in Mozc
2================================
3
4Objective
5---------
6
7Utilize surrounding text information to achieve more efficient and intelligent text input experience.
8
9Design Highlights
10-----------------
11
12### Temporary history invalidation
13
14Mozc converter internally maintains history segments mainly for users who input Japanese sentence with segments in fragments segments. Imagine that a user input an example sentence “今日は良い天気です” as 3 segments as follows.
15
16  1. kyouha (今日は) -> convert -> commit
17  1. yoi (良い) -> convert -> commit
18  1. tennkidesu (天気です) -> convert -> commit
19
20At the step 3, Mozc converter takes the result of 1 and 2 into consideration when "tennkidesu" is converted. However, this approach may not work well when the caret position is moved but the Mozc converter cannot notice it. In order to work around this situation, Mozc converter can read the preceding text and check if the internal history information is consistent with the preceding text. If they are inconsistent, history segments should be invalidated.
21
22### History reconstruction
23
24In order to improve the conversion quality when preceding text and history segment are mismatched, it would be nice if we can reconstruct (or emulate) history segments from the preceding text.
25
26In this project, reconstruct segments that consists of only number or alphabet as a first step. Reconstructing more variety of tokens will be future work.
27
28Following table describe the mappings from a preceding text to key/value and POS (Part-of-speech) ID.
29
30| Preceding Text     | Key     | Value     | POS     |
31|:-------------------|:--------|:----------|:--------|
32| "10"               | "10"    | "10"      | Number  |
33| "10 "              | "10"    | "10"      | Number  |
34| "1 10 "            | "10"    | "10"      | Number  |
35| "C60"              | "60"    | "60"      | Number  |
36| "abc"              | "abc"   | "abc"     | UniqueNoun |
37| "this is"          | "is"    | "is"      | UniqueNoun |
38| "あ"              | N/A     | N/A       | N/A     |
39
40Scope
41-----
42
43Here is the list of typical cases when preceding text and history segment are mismatched.
44
45  * Multiple users are writing the same document. (e.g. Google Document)
46  * A user prefers to turn IME off when he/she input alphanumeric characters. e.g. He/she inputs "今日は Andy に会う" as following steps:
47    1. Turn IME on
48    1. Type "kyouha" then convert it to "今日は"
49    1. Turn IME off
50    1. Type " Andy "
51    1. Turn IME on
52    1. Type "niau" then convert it to "に会う"
53  * Caret position is moved by mouse.
54
55Surrounding text has been available in the following OSes and frameworks:
56  * Windows OS
57    * Microsoft Internet Explorer
58    * Google Chrome 17+
59    * Mozilla Firefox
60    * Microsoft Office
61    * Windows Presentation Foundation (WPF)
62  * Apple OS X
63  * Android OS
64  * Chromium OS
65
66Here is the list of other possible usages of surrounding text in future projects.
67
68  * Language detection.
69  * Character width (narrow/wide) adjustment.
70  * Personal name recognition (e.g., SNS screen names)
71
72Risk
73----
74
75Some buggy applications that wrongly handle surrounding text event may become unstable. Basically there should be no privacy risk because applications are expected to hide sensitive text such as password from IME.
76
77Production Impact
78-----------------
79
80Available on Windows, Apple OS X, Chromium OS and Linux desktop.  No impact for Android platform.
81
82Release History
83---------------
84
85  * Initial release: 1.11.1490.10x
86
87Reference
88---------
89
90  * [chrome.input.ime](http://developer.chrome.com/extensions/input.ime.html)
91