1Surrounding Text Support in Mozc 2================================ 3 4Objective 5--------- 6 7Utilize surrounding text information to achieve more efficient and intelligent text input experience. 8 9Design Highlights 10----------------- 11 12### Temporary history invalidation 13 14Mozc converter internally maintains history segments mainly for users who input Japanese sentence with segments in fragments segments. Imagine that a user input an example sentence “今日は良い天気です” as 3 segments as follows. 15 16 1. kyouha (今日は) -> convert -> commit 17 1. yoi (良い) -> convert -> commit 18 1. tennkidesu (天気です) -> convert -> commit 19 20At the step 3, Mozc converter takes the result of 1 and 2 into consideration when "tennkidesu" is converted. However, this approach may not work well when the caret position is moved but the Mozc converter cannot notice it. In order to work around this situation, Mozc converter can read the preceding text and check if the internal history information is consistent with the preceding text. If they are inconsistent, history segments should be invalidated. 21 22### History reconstruction 23 24In order to improve the conversion quality when preceding text and history segment are mismatched, it would be nice if we can reconstruct (or emulate) history segments from the preceding text. 25 26In this project, reconstruct segments that consists of only number or alphabet as a first step. Reconstructing more variety of tokens will be future work. 27 28Following table describe the mappings from a preceding text to key/value and POS (Part-of-speech) ID. 29 30| Preceding Text | Key | Value | POS | 31|:-------------------|:--------|:----------|:--------| 32| "10" | "10" | "10" | Number | 33| "10 " | "10" | "10" | Number | 34| "1 10 " | "10" | "10" | Number | 35| "C60" | "60" | "60" | Number | 36| "abc" | "abc" | "abc" | UniqueNoun | 37| "this is" | "is" | "is" | UniqueNoun | 38| "あ" | N/A | N/A | N/A | 39 40Scope 41----- 42 43Here is the list of typical cases when preceding text and history segment are mismatched. 44 45 * Multiple users are writing the same document. (e.g. Google Document) 46 * A user prefers to turn IME off when he/she input alphanumeric characters. e.g. He/she inputs "今日は Andy に会う" as following steps: 47 1. Turn IME on 48 1. Type "kyouha" then convert it to "今日は" 49 1. Turn IME off 50 1. Type " Andy " 51 1. Turn IME on 52 1. Type "niau" then convert it to "に会う" 53 * Caret position is moved by mouse. 54 55Surrounding text has been available in the following OSes and frameworks: 56 * Windows OS 57 * Microsoft Internet Explorer 58 * Google Chrome 17+ 59 * Mozilla Firefox 60 * Microsoft Office 61 * Windows Presentation Foundation (WPF) 62 * Apple OS X 63 * Android OS 64 * Chromium OS 65 66Here is the list of other possible usages of surrounding text in future projects. 67 68 * Language detection. 69 * Character width (narrow/wide) adjustment. 70 * Personal name recognition (e.g., SNS screen names) 71 72Risk 73---- 74 75Some buggy applications that wrongly handle surrounding text event may become unstable. Basically there should be no privacy risk because applications are expected to hide sensitive text such as password from IME. 76 77Production Impact 78----------------- 79 80Available on Windows, Apple OS X, Chromium OS and Linux desktop. No impact for Android platform. 81 82Release History 83--------------- 84 85 * Initial release: 1.11.1490.10x 86 87Reference 88--------- 89 90 * [chrome.input.ime](http://developer.chrome.com/extensions/input.ime.html) 91