1\section{Character} 2\label{group__m17nCharacter}\index{Character@{Character}} 3Character objects and API for them. 4\subsection*{Variables: Keys of character properties} 5These symbols are used as keys of character properties. \begin{CompactItemize} 6\item 7{\bf MSymbol} {\bf Mscript} 8\begin{CompactList}\small\item\em Key for script. \item\end{CompactList}\item 9{\bf MSymbol} {\bf Mname} 10\begin{CompactList}\small\item\em Key for character name. \item\end{CompactList}\item 11{\bf MSymbol} {\bf Mcategory} 12\begin{CompactList}\small\item\em Key for general category. \item\end{CompactList}\item 13{\bf MSymbol} {\bf Mcombining\_\-class} 14\begin{CompactList}\small\item\em Key for canonical combining class. \item\end{CompactList}\item 15{\bf MSymbol} {\bf Mbidi\_\-category} 16\begin{CompactList}\small\item\em Key for bidi category. \item\end{CompactList}\item 17{\bf MSymbol} {\bf Msimple\_\-case\_\-folding} 18\begin{CompactList}\small\item\em Key for corresponding single lowercase character. \item\end{CompactList}\item 19{\bf MSymbol} {\bf Mcomplicated\_\-case\_\-folding} 20\begin{CompactList}\small\item\em Key for corresponding multiple lowercase characters. \item\end{CompactList}\item 21{\bf MSymbol} {\bf Mcased} 22\begin{CompactList}\small\item\em Key for values used in case operation. \item\end{CompactList}\item 23{\bf MSymbol} {\bf Msoft\_\-dotted} 24\begin{CompactList}\small\item\em Key for values used in case operation. \item\end{CompactList}\item 25{\bf MSymbol} {\bf Mcase\_\-mapping} 26\begin{CompactList}\small\item\em Key for values used in case operation. \item\end{CompactList}\item 27{\bf MSymbol} {\bf Mblock} 28\begin{CompactList}\small\item\em Key for script block name. \item\end{CompactList}\end{CompactItemize} 29\subsection*{Defines} 30\begin{CompactItemize} 31\item 32\#define {\bf MCHAR\_\-MAX} 33\begin{CompactList}\small\item\em Maximum character code. \item\end{CompactList}\end{CompactItemize} 34\subsection*{Functions} 35\begin{CompactItemize} 36\item 37{\bf MSymbol} {\bf mchar\_\-define\_\-property} (const char $\ast$name, {\bf MSymbol} type) 38\begin{CompactList}\small\item\em Define a character property. \item\end{CompactList}\item 39void $\ast$ {\bf mchar\_\-get\_\-prop} (int c, {\bf MSymbol} key) 40\begin{CompactList}\small\item\em Get the value of a character property. \item\end{CompactList}\item 41int {\bf mchar\_\-put\_\-prop} (int c, {\bf MSymbol} key, void $\ast$val) 42\begin{CompactList}\small\item\em Set the value of a character property. \item\end{CompactList}\item 43{\bf MCharTable} $\ast$ {\bf mchar\_\-get\_\-prop\_\-table} ({\bf MSymbol} key, {\bf MSymbol} $\ast$type) 44\begin{CompactList}\small\item\em Get the char-table for a character property. \item\end{CompactList}\end{CompactItemize} 45 46 47\subsection{Detailed Description} 48Character objects and API for them. 49 50The m17n library represents a {\em character\/} by a character code (an integer). The minimum character code is {\tt 0}. The maximum character code is defined by the macro \doxyref{MCHAR\_\-MAX}{p.}{group__m17nCharacter_gdb36cc417b000c5f9f028992f69b5ebc}. It is assured that \doxyref{MCHAR\_\-MAX}{p.}{group__m17nCharacter_gdb36cc417b000c5f9f028992f69b5ebc} is not smaller than {\tt 0x3FFFFF} (22 bits). 51 52Characters {\tt 0} to {\tt 0x10FFFF} are equivalent to the Unicode characters of the same code values. 53 54A character can have zero or more properties called {\em character\/} {\em properties\/}. A character property consists of a {\em key\/} and a {\em value\/}, where key is a symbol and value is anything that can be cast to {\tt (void $\ast$)}. \char`\"{}The character property that belongs to character C and whose key is K\char`\"{} may be shortened to \char`\"{}the K property of C\char`\"{}. 55 56\subsection{Define Documentation} 57\index{m17nCharacter@{m17nCharacter}!MCHAR\_\-MAX@{MCHAR\_\-MAX}} 58\index{MCHAR\_\-MAX@{MCHAR\_\-MAX}!m17nCharacter@{m17nCharacter}} 59\subsubsection[MCHAR\_\-MAX]{\setlength{\rightskip}{0pt plus 5cm}\#define MCHAR\_\-MAX}\label{group__m17nCharacter_gdb36cc417b000c5f9f028992f69b5ebc} 60 61 62Maximum character code. 63 64The macro \doxyref{MCHAR\_\-MAX}{p.}{group__m17nCharacter_gdb36cc417b000c5f9f028992f69b5ebc} gives the maximum character code. 65 66\subsection{Function Documentation} 67\index{m17nCharacter@{m17nCharacter}!mchar\_\-define\_\-property@{mchar\_\-define\_\-property}} 68\index{mchar\_\-define\_\-property@{mchar\_\-define\_\-property}!m17nCharacter@{m17nCharacter}} 69\subsubsection[mchar\_\-define\_\-property]{\setlength{\rightskip}{0pt plus 5cm}{\bf MSymbol} mchar\_\-define\_\-property (const char $\ast$ {\em name}, \/ {\bf MSymbol} {\em type})}\label{group__m17nCharacter_g8c6dde5d282ae96c899f662e1dc17879} 70 71 72Define a character property. 73 74The \doxyref{mchar\_\-define\_\-property()}{p.}{group__m17nCharacter_g8c6dde5d282ae96c899f662e1dc17879} function searches the m17n database for a data whose tags are $<$\doxyref{Mchar\_\-table}{p.}{group__m17nChartable_g91e88555aace667aa53a16e5fbb4226c}, {\bf type}, {\bf sym} $>$. Here, {\bf sym} is a symbol whose name is {\bf name}. {\bf type} must be \doxyref{Mstring}{p.}{group__m17nSymbol_g60daf7d600a1f487862366a37c171ce5}, \doxyref{Mtext}{p.}{group__m17nPlist_g1a22859374071a0ca66f12452afee8bd}, \doxyref{Msymbol}{p.}{group__m17nSymbol_g6592d4eb3c46fe7fb8993c252b8fedeb}, \doxyref{Minteger}{p.}{group__m17nPlist_g0ce08eb57aa339db4d4745e75e80fdd8}, or \doxyref{Mplist}{p.}{group__m17nPlist_g933000e154873f9bfcaa56d976bd259b}. 75 76\begin{Desc} 77\item[Return value:]If the operation was successful, \doxyref{mchar\_\-define\_\-property()}{p.}{group__m17nCharacter_g8c6dde5d282ae96c899f662e1dc17879} returns {\bf sym}. Otherwise it returns \doxyref{Mnil}{p.}{group__m17nSymbol_g0346fc05efcccc8f11271b51c0fe3eeb}.\end{Desc} 78\begin{Desc} 79\item[Errors:]{\tt MERROR\_\-DB} \end{Desc} 80\begin{Desc} 81\item[See Also:]\doxyref{mchar\_\-get\_\-prop()}{p.}{group__m17nCharacter_g66ef808ae3cf10d8080d579a993c6459}, \doxyref{mchar\_\-put\_\-prop()}{p.}{group__m17nCharacter_g2dc345ba89a546f861b141a71d1609f7} \end{Desc} 82\index{m17nCharacter@{m17nCharacter}!mchar\_\-get\_\-prop@{mchar\_\-get\_\-prop}} 83\index{mchar\_\-get\_\-prop@{mchar\_\-get\_\-prop}!m17nCharacter@{m17nCharacter}} 84\subsubsection[mchar\_\-get\_\-prop]{\setlength{\rightskip}{0pt plus 5cm}void$\ast$ mchar\_\-get\_\-prop (int {\em c}, \/ {\bf MSymbol} {\em key})}\label{group__m17nCharacter_g66ef808ae3cf10d8080d579a993c6459} 85 86 87Get the value of a character property. 88 89The \doxyref{mchar\_\-get\_\-prop()}{p.}{group__m17nCharacter_g66ef808ae3cf10d8080d579a993c6459} function searches character {\bf c} for the character property whose key is {\bf key}. 90 91\begin{Desc} 92\item[Return value:]If the operation was successful, \doxyref{mchar\_\-get\_\-prop()}{p.}{group__m17nCharacter_g66ef808ae3cf10d8080d579a993c6459} returns the value of the character property. Otherwise it returns {\tt NULL}.\end{Desc} 93\begin{Desc} 94\item[Errors:]{\tt MERROR\_\-SYMBOL}, {\tt MERROR\_\-DB} \end{Desc} 95\begin{Desc} 96\item[See Also:]\doxyref{mchar\_\-define\_\-property()}{p.}{group__m17nCharacter_g8c6dde5d282ae96c899f662e1dc17879}, \doxyref{mchar\_\-put\_\-prop()}{p.}{group__m17nCharacter_g2dc345ba89a546f861b141a71d1609f7} \end{Desc} 97\index{m17nCharacter@{m17nCharacter}!mchar\_\-put\_\-prop@{mchar\_\-put\_\-prop}} 98\index{mchar\_\-put\_\-prop@{mchar\_\-put\_\-prop}!m17nCharacter@{m17nCharacter}} 99\subsubsection[mchar\_\-put\_\-prop]{\setlength{\rightskip}{0pt plus 5cm}int mchar\_\-put\_\-prop (int {\em c}, \/ {\bf MSymbol} {\em key}, \/ void $\ast$ {\em val})}\label{group__m17nCharacter_g2dc345ba89a546f861b141a71d1609f7} 100 101 102Set the value of a character property. 103 104The \doxyref{mchar\_\-put\_\-prop()}{p.}{group__m17nCharacter_g2dc345ba89a546f861b141a71d1609f7} function searches character {\bf c} for the character property whose key is {\bf key} and assigns {\bf val} to the value of the found property. 105 106\begin{Desc} 107\item[Return value:]If the operation was successful, \doxyref{mchar\_\-put\_\-prop()}{p.}{group__m17nCharacter_g2dc345ba89a546f861b141a71d1609f7} returns 0. Otherwise, it returns -1.\end{Desc} 108\begin{Desc} 109\item[Errors:]{\tt MERROR\_\-SYMBOL}, {\tt MERROR\_\-DB} \end{Desc} 110\begin{Desc} 111\item[See Also:]\doxyref{mchar\_\-define\_\-property()}{p.}{group__m17nCharacter_g8c6dde5d282ae96c899f662e1dc17879}, \doxyref{mchar\_\-get\_\-prop()}{p.}{group__m17nCharacter_g66ef808ae3cf10d8080d579a993c6459} \end{Desc} 112\index{m17nCharacter@{m17nCharacter}!mchar\_\-get\_\-prop\_\-table@{mchar\_\-get\_\-prop\_\-table}} 113\index{mchar\_\-get\_\-prop\_\-table@{mchar\_\-get\_\-prop\_\-table}!m17nCharacter@{m17nCharacter}} 114\subsubsection[mchar\_\-get\_\-prop\_\-table]{\setlength{\rightskip}{0pt plus 5cm}{\bf MCharTable}$\ast$ mchar\_\-get\_\-prop\_\-table ({\bf MSymbol} {\em key}, \/ {\bf MSymbol} $\ast$ {\em type})}\label{group__m17nCharacter_ga44bd8292de2055556e05cf02cf1292f} 115 116 117Get the char-table for a character property. 118 119The \doxyref{mchar\_\-get\_\-prop\_\-table()}{p.}{group__m17nCharacter_ga44bd8292de2055556e05cf02cf1292f} function returns a char-table that contains the character property whose key is {\bf key}. If {\bf type} is not NULL, this function stores the type of the property in the place pointed by {\bf type}. See \doxyref{mchar\_\-define\_\-property()}{p.}{group__m17nCharacter_g8c6dde5d282ae96c899f662e1dc17879} for types of character property. 120 121\begin{Desc} 122\item[Return value:]If {\bf key} is a valid character property key, this function returns a char-table. Otherwise NULL is retuned. \end{Desc} 123 124 125\subsection{Variable Documentation} 126\index{m17nCharacter@{m17nCharacter}!Mscript@{Mscript}} 127\index{Mscript@{Mscript}!m17nCharacter@{m17nCharacter}} 128\subsubsection[Mscript]{\setlength{\rightskip}{0pt plus 5cm}{\bf MSymbol} {\bf Mscript}}\label{group__m17nCharacter_g1efea11830fa151fad724fbdc4212750} 129 130 131Key for script. 132 133The symbol \doxyref{Mscript}{p.}{group__m17nCharacter_g1efea11830fa151fad724fbdc4212750} has the name {\tt \char`\"{}script\char`\"{}} and is used as the key of a character property. The value of such a property is a symbol representing the script to which the character belongs. 134 135Each symbol that represents a script has one of the names listed in the {\em Unicode Technical Report \#24\/}. \index{m17nCharacter@{m17nCharacter}!Mname@{Mname}} 136\index{Mname@{Mname}!m17nCharacter@{m17nCharacter}} 137\subsubsection[Mname]{\setlength{\rightskip}{0pt plus 5cm}{\bf MSymbol} {\bf Mname}}\label{group__m17nCharacter_g4848713c0a3c225f3600e10d9ae56631} 138 139 140Key for character name. 141 142The symbol \doxyref{Mname}{p.}{group__m17nCharacter_g4848713c0a3c225f3600e10d9ae56631} has the name {\tt \char`\"{}name\char`\"{}} and is used as the key of a character property. The value of such a property is a C-string representing the name of the character. \index{m17nCharacter@{m17nCharacter}!Mcategory@{Mcategory}} 143\index{Mcategory@{Mcategory}!m17nCharacter@{m17nCharacter}} 144\subsubsection[Mcategory]{\setlength{\rightskip}{0pt plus 5cm}{\bf MSymbol} {\bf Mcategory}}\label{group__m17nCharacter_gd6d719ce33cdd01171e8a3773d08af09} 145 146 147Key for general category. 148 149The symbol \doxyref{Mcategory}{p.}{group__m17nCharacter_gd6d719ce33cdd01171e8a3773d08af09} has the name {\tt \char`\"{}category\char`\"{}} and is used as the key of a character property. The value of such a property is a symbol representing the {\em general category\/} of the character. 150 151Each symbol that represents a general category has one of the names listed as abbreviations for {\em General Category\/} in Unicode. \index{m17nCharacter@{m17nCharacter}!Mcombining\_\-class@{Mcombining\_\-class}} 152\index{Mcombining\_\-class@{Mcombining\_\-class}!m17nCharacter@{m17nCharacter}} 153\subsubsection[Mcombining\_\-class]{\setlength{\rightskip}{0pt plus 5cm}{\bf MSymbol} {\bf Mcombining\_\-class}}\label{group__m17nCharacter_g6e59888c09af64ee3b20208bf1b2de6e} 154 155 156Key for canonical combining class. 157 158The symbol \doxyref{Mcombining\_\-class}{p.}{group__m17nCharacter_g6e59888c09af64ee3b20208bf1b2de6e} has the name {\tt \char`\"{}combining-class\char`\"{}} and is used as the key of a character property. The value of such a property is an integer that represents the {\em canonical combining class\/} of the character. 159 160The meaning of each integer that represents a canonical combining class is identical to the one defined in Unicode. \index{m17nCharacter@{m17nCharacter}!Mbidi\_\-category@{Mbidi\_\-category}} 161\index{Mbidi\_\-category@{Mbidi\_\-category}!m17nCharacter@{m17nCharacter}} 162\subsubsection[Mbidi\_\-category]{\setlength{\rightskip}{0pt plus 5cm}{\bf MSymbol} {\bf Mbidi\_\-category}}\label{group__m17nCharacter_g35ac97a9caf868b146b1843d4c6db02f} 163 164 165Key for bidi category. 166 167The symbol \doxyref{Mbidi\_\-category}{p.}{group__m17nCharacter_g35ac97a9caf868b146b1843d4c6db02f} has the name {\tt \char`\"{}bidi-category\char`\"{}} and is used as the key of a character property. The value of such a property is a symbol that represents the {\em bidirectional category\/} of the character. 168 169Each symbol that represents a bidirectional category has one of the names listed as types of {\em Bidirectional Category\/} in Unicode. \index{m17nCharacter@{m17nCharacter}!Msimple\_\-case\_\-folding@{Msimple\_\-case\_\-folding}} 170\index{Msimple\_\-case\_\-folding@{Msimple\_\-case\_\-folding}!m17nCharacter@{m17nCharacter}} 171\subsubsection[Msimple\_\-case\_\-folding]{\setlength{\rightskip}{0pt plus 5cm}{\bf MSymbol} {\bf Msimple\_\-case\_\-folding}}\label{group__m17nCharacter_g5c971245e8af385056e6730aa6446c64} 172 173 174Key for corresponding single lowercase character. 175 176The symbol \doxyref{Msimple\_\-case\_\-folding}{p.}{group__m17nCharacter_g5c971245e8af385056e6730aa6446c64} has the name {\tt \char`\"{}simple-case-folding\char`\"{}} and is used as the key of a character property. The value of such a property is the corresponding single lowercase character that is used when comparing M-texts ignoring cases. 177 178If a character requires a complicated comparison (i.e. cannot be compared by simply mapping to another single character), the value of such a property is {\tt 0xFFFF}. In this case, the character has another property whose key is \doxyref{Mcomplicated\_\-case\_\-folding}{p.}{group__m17nCharacter_ge5e8271f68619d95a70930c18bc48220}. \index{m17nCharacter@{m17nCharacter}!Mcomplicated\_\-case\_\-folding@{Mcomplicated\_\-case\_\-folding}} 179\index{Mcomplicated\_\-case\_\-folding@{Mcomplicated\_\-case\_\-folding}!m17nCharacter@{m17nCharacter}} 180\subsubsection[Mcomplicated\_\-case\_\-folding]{\setlength{\rightskip}{0pt plus 5cm}{\bf MSymbol} {\bf Mcomplicated\_\-case\_\-folding}}\label{group__m17nCharacter_ge5e8271f68619d95a70930c18bc48220} 181 182 183Key for corresponding multiple lowercase characters. 184 185The symbol \doxyref{Mcomplicated\_\-case\_\-folding}{p.}{group__m17nCharacter_ge5e8271f68619d95a70930c18bc48220} has the name {\tt \char`\"{}complicated-case-folding\char`\"{}} and is used as the key of a character property. The value of such a property is the corresponding M-text that contains a sequence of lowercase characters to be used for comparing M-texts ignoring case. \index{m17nCharacter@{m17nCharacter}!Mcased@{Mcased}} 186\index{Mcased@{Mcased}!m17nCharacter@{m17nCharacter}} 187\subsubsection[Mcased]{\setlength{\rightskip}{0pt plus 5cm}{\bf MSymbol} {\bf Mcased}}\label{group__m17nCharacter_g4df1027f7239776ec28478de769f0e97} 188 189 190Key for values used in case operation. 191 192The symbol \doxyref{Mcased}{p.}{group__m17nCharacter_g4df1027f7239776ec28478de769f0e97} has the name {\tt \char`\"{}cased\char`\"{}} and is used as the key of charater property. The value of such a property is an integer value 1, 2, or 3 representing \char`\"{}cased\char`\"{}, \char`\"{}case-ignorable\char`\"{}, and both of them respective. See the Unicode Standard 5.0 (Section 3.13 Default Case Algorithm) for the detail. \index{m17nCharacter@{m17nCharacter}!Msoft\_\-dotted@{Msoft\_\-dotted}} 193\index{Msoft\_\-dotted@{Msoft\_\-dotted}!m17nCharacter@{m17nCharacter}} 194\subsubsection[Msoft\_\-dotted]{\setlength{\rightskip}{0pt plus 5cm}{\bf MSymbol} {\bf Msoft\_\-dotted}}\label{group__m17nCharacter_g54dd86441b0b2829c6c482d509ee02c3} 195 196 197Key for values used in case operation. 198 199The symbol \doxyref{Msoft\_\-dotted}{p.}{group__m17nCharacter_g54dd86441b0b2829c6c482d509ee02c3} has the name {\tt \char`\"{}soft-dotted\char`\"{}} and is used as the key of charater property. The value of such a property is \doxyref{Mt}{p.}{group__m17nSymbol_g8769a573efbb023b4d77f9d03babc09f} if a character has \char`\"{}Soft\_\-Dotted\char`\"{} property, and \doxyref{Mnil}{p.}{group__m17nSymbol_g0346fc05efcccc8f11271b51c0fe3eeb} otherwise. See the Unicode Standard 5.0 (Section 3.13 Default Case Algorithm) for the detail. \index{m17nCharacter@{m17nCharacter}!Mcase\_\-mapping@{Mcase\_\-mapping}} 200\index{Mcase\_\-mapping@{Mcase\_\-mapping}!m17nCharacter@{m17nCharacter}} 201\subsubsection[Mcase\_\-mapping]{\setlength{\rightskip}{0pt plus 5cm}{\bf MSymbol} {\bf Mcase\_\-mapping}}\label{group__m17nCharacter_gbf5314e978cea3ca60461022c03d843a} 202 203 204Key for values used in case operation. 205 206The symbol \doxyref{Mcase\_\-mapping}{p.}{group__m17nCharacter_gbf5314e978cea3ca60461022c03d843a} has the name {\tt \char`\"{}case-mapping\char`\"{}} and is used as the key of charater property. The value of such a property is a plist of three M-Texts; lower, title, and upper of the corresponding character. See the Unicode Standard 5.0 (Section 5.18 Case Mappings) for the detail. \index{m17nCharacter@{m17nCharacter}!Mblock@{Mblock}} 207\index{Mblock@{Mblock}!m17nCharacter@{m17nCharacter}} 208\subsubsection[Mblock]{\setlength{\rightskip}{0pt plus 5cm}{\bf MSymbol} {\bf Mblock}}\label{group__m17nCharacter_g262e95cb77fc8470863bf2ee1fc6332b} 209 210 211Key for script block name. 212 213The symbol \doxyref{Mblock}{p.}{group__m17nCharacter_g262e95cb77fc8470863bf2ee1fc6332b} the name {\tt \char`\"{}block\char`\"{}} and is used as the key of charater property. The value of such a property is a symbol representing a script block of the corresponding character.