1SAX.setDocumentLocator() 2SAX.startDocument() 3SAX.internalSubset(kanjidic2, , ) 4SAX.comment( Version 1.3 5 This is the DTD of the XML-format kanji file combining information from 6 the KANJIDIC and KANJD212 files. It is intended to be largely self- 7 documenting, with each field being accompanied by an explanatory 8 comment. 9 10 The file covers the following kanji: 11 (a) the 6,355 kanji from JIS X 0208; 12 (b) the 5,801 kanji from JIS X 0212; 13 (c) the 3,625 kanji from JIS X 0213 as follows: 14 (i) the 2,741 kanji which are also in JIS X 0212 have 15 JIS X 0213 code-points (kuten) added to the existing entry; 16 (ii) the 884 "new" kanji have new entries. 17 18 At the end of the explanation for a number of fields there is a tag 19 with the format [N]. This indicates the leading letter(s) of the 20 equivalent field in the KANJIDIC and KANJD212 files. 21 22 The KANJIDIC documentation should also be read for additional 23 information about the information in the file. 24 ) 25SAX.elementDecl(kanjidic2, 4, ...) 26SAX.elementDecl(header, 4, ...) 27SAX.comment( 28 The single header element will contain identification information 29 about the version of the file 30 ) 31SAX.elementDecl(file_version, 3, ...) 32SAX.comment( 33 This field denotes the version of kanjidic2 structure, as more 34 than one version may exist. 35 ) 36SAX.elementDecl(database_version, 3, ...) 37SAX.comment( 38 The version of the file, in the format YYYY-NN, where NN will be 39 a number starting with 01 for the first version released in a 40 calendar year, then increasing for each version in that year. 41 ) 42SAX.elementDecl(date_of_creation, 3, ...) 43SAX.comment( 44 The date the file was created in international format (YYYY-MM-DD). 45 ) 46SAX.elementDecl(character, 4, ...) 47SAX.elementDecl(literal, 3, ...) 48SAX.comment( 49 The character itself in UTF8 coding. 50 ) 51SAX.elementDecl(codepoint, 4, ...) 52SAX.comment( 53 The codepoint element states the code of the character in the various 54 character set standards. 55 ) 56SAX.elementDecl(cp_value, 3, ...) 57SAX.comment( 58 The cp_value contains the codepoint of the character in a particular 59 standard. The standard will be identified in the cp_type attribute. 60 ) 61SAX.attributeDecl(cp_value, cp_type, 1, 2, NULL, ...) 62SAX.comment( 63 The cp_type attribute states the coding standard applying to the 64 element. The values assigned so far are: 65 jis208 - JIS X 0208-1997 - kuten coding (nn-nn) 66 jis212 - JIS X 0212-1990 - kuten coding (nn-nn) 67 jis213 - JIS X 0213-2000 - kuten coding (p-nn-nn) 68 ucs - Unicode 4.0 - hex coding (4 or 5 hexadecimal digits) 69 ) 70SAX.elementDecl(radical, 4, ...) 71SAX.elementDecl(rad_value, 3, ...) 72SAX.comment( 73 The radical number, in the range 1 to 214. The particular 74 classification type is stated in the rad_type attribute. 75 ) 76SAX.attributeDecl(rad_value, rad_type, 1, 2, NULL, ...) 77SAX.comment( 78 The rad_type attribute states the type of radical classification. 79 classical - as recorded in the KangXi Zidian. 80 nelson - as used in the Nelson "Modern Japanese-English 81 Character Dictionary" (i.e. the Classic, not the New Nelson). 82 This will only be used where Nelson reclassified the kanji. 83 ) 84SAX.elementDecl(misc, 4, ...) 85SAX.elementDecl(grade, 3, ...) 86SAX.comment( 87 The Jouyou Kanji grade level. 1 through 6 indicate the grade in which 88 the kanji is taught in Japanese schools. 8 indicates it is one of the 89 remaining Jouyou Kanji to be learned in junior high school, and 9 90 indicates it is a Jinmeiyou (for use in names) kanji. [G] 91 ) 92SAX.elementDecl(stroke_count, 3, ...) 93SAX.comment( 94 The stroke count of the kanji, including the radical. If more than 95 one, the first is considered the accepted count, while subsequent ones 96 are common miscounts. (See Appendix E. of the KANJIDIC documentation 97 for some of the rules applied when counting strokes in some of the 98 radicals.) [S] 99 ) 100SAX.elementDecl(variant, 3, ...) 101SAX.comment( 102 A cross-reference code to another kanji, usually regarded as a variant. 103 The type of cross-reference is given in the var_type attribute. 104 ) 105SAX.attributeDecl(variant, var_type, 1, 2, NULL, ...) 106SAX.comment( 107 The var_type attribute indicates the type of variant code. The current 108 values are: 109 jis208 - in JIS X 0208 - kuten coding 110 jis212 - in JIS X 0212 - kuten coding 111 jis213 - in JIS X 0213 - kuten coding 112 deroo - De Roo number - numeric 113 njecd - Halpern NJECD index number - numeric 114 s_h - The Kanji Dictionary (Spahn & Hadamitzky) - descriptor 115 nelson - "Classic" Nelson - numeric 116 oneill - Japanese Names (O'Neill) - numeric 117 ) 118SAX.elementDecl(freq, 3, ...) 119SAX.comment( 120 A frequency-of-use ranking. The 2,500 most-used characters have a 121 ranking; those characters that lack this field are not ranked. The 122 frequency is a number from 1 to 2,500 that expresses the relative 123 frequency of occurrence of a character in modern Japanese. This is 124 based on a survey in newspapers, so it is biassed towards kanji 125 used in newspaper articles. The discrimination between the less 126 frequently used kanji is not strong. 127 ) 128SAX.elementDecl(rad_name, 3, ...) 129SAX.comment( 130 When the kanji is itself a radical and has a name, this element 131 contains the name (in hiragana.) [T2] 132 ) 133SAX.elementDecl(dic_number, 4, ...) 134SAX.comment( 135 This element contains the index numbers and similar unstructured 136 information such as page numbers in a number of published dictionaries, 137 and instructional books on kanji. 138 ) 139SAX.elementDecl(dic_ref, 3, ...) 140SAX.comment( 141 Each dic_ref contains an index number. The particular dictionary, 142 etc. is defined by the dr_type attribute. 143 ) 144SAX.attributeDecl(dic_ref, dr_type, 1, 2, NULL, ...) 145SAX.comment( 146 The dr_type defines the dictionary or reference book, etc. to which 147 dic_ref element applies. The initial allocation is: 148 nelson_c - "Modern Reader's Japanese-English Character Dictionary", 149 edited by Andrew Nelson (now published as the "Classic" 150 Nelson). 151 nelson_n - "The New Nelson Japanese-English Character Dictionary", 152 edited by John Haig. 153 halpern_njecd - "New Japanese-English Character Dictionary", 154 edited by Jack Halpern. 155 halpern_kkld - "Kanji Learners Dictionary" (Kodansha) edited by 156 Jack Halpern. 157 heisig - "Remembering The Kanji" by James Heisig. 158 gakken - "A New Dictionary of Kanji Usage" (Gakken) 159 oneill_names - "Japanese Names", by P.G. O'Neill. 160 oneill_kk - "Essential Kanji" by P.G. O'Neill. 161 moro - "Daikanwajiten" compiled by Morohashi. For some kanji two 162 additional attributes are used: m_vol: the volume of the 163 dictionary in which the kanji is found, and m_page: the page 164 number in the volume. 165 henshall - "A Guide To Remembering Japanese Characters" by 166 Kenneth G. Henshall. 167 sh_kk - "Kanji and Kana" by Spahn and Hadamitzky. 168 sakade - "A Guide To Reading and Writing Japanese" edited by 169 Florence Sakade. 170 henshall3 - "A Guide To Reading and Writing Japanese" 3rd 171 edition, edited by Henshall, Seeley and De Groot. 172 tutt_cards - Tuttle Kanji Cards, compiled by Alexander Kask. 173 crowley - "The Kanji Way to Japanese Language Power" by 174 Dale Crowley. 175 kanji_in_context - "Kanji in Context" by Nishiguchi and Kono. 176 busy_people - "Japanese For Busy People" vols I-III, published 177 by the AJLT. The codes are the volume.chapter. 178 kodansha_compact - the "Kodansha Compact Kanji Guide". 179 ) 180SAX.attributeDecl(dic_ref, m_vol, 1, 3, NULL, ...) 181SAX.comment( 182 See above under "moro". 183 ) 184SAX.attributeDecl(dic_ref, m_page, 1, 3, NULL, ...) 185SAX.comment( 186 See above under "moro". 187 ) 188SAX.elementDecl(query_code, 4, ...) 189SAX.comment( 190 These codes contain information relating to the glyph, and can be used 191 for finding a required kanji. The type of code is defined by the 192 qc_type attribute. 193 ) 194SAX.elementDecl(q_code, 3, ...) 195SAX.comment( 196 The q_code contains the actual query-code value, according to the 197 qc_type attribute. 198 ) 199SAX.attributeDecl(q_code, qc_type, 1, 2, NULL, ...) 200SAX.comment( 201 The q_code attribute defines the type of query code. The current values 202 are: 203 skip - Halpern's SKIP (System of Kanji Indexing by Patterns) 204 code. The format is n-nn-nn. See the KANJIDIC documentation 205 for a description of the code and restrictions on the 206 commercial use of this data. [P] 207 208 sh_desc - the descriptor codes for The Kanji Dictionary (Tuttle 209 1996) by Spahn and Hadamitzky. They are in the form nxnn.n, 210 e.g. 3k11.2, where the kanji has 3 strokes in the 211 identifying radical, it is radical "k" in the SH 212 classification system, there are 11 other strokes, and it is 213 the 2nd kanji in the 3k11 sequence. (I am very grateful to 214 Mark Spahn for providing the list of these descriptor codes 215 for the kanji in this file.) [I] 216 four_corner - the "Four Corner" code for the kanji. This is a code 217 invented by Wang Chen in 1928. See the KANJIDIC documentation 218 for an overview of the Four Corner System. [Q] 219 220 deroo - the codes developed by the late Father Joseph De Roo, and 221 published in his book "2001 Kanji" (Bojinsha). Fr De Roo 222 gave his permission for these codes to be included. [DR] 223 misclass - a possible misclassification of the kanji according 224 to one of the code types. (See the "Z" codes in the KANJIDIC 225 documentation for more details.) 226 227 ) 228SAX.elementDecl(reading_meaning, 4, ...) 229SAX.comment( 230 The readings for the kanji in several languages, and the meanings, also 231 in several languages. The readings and meanings are grouped to enable 232 the handling of the situation where the meaning is differentiated by 233 reading. [T1] 234 ) 235SAX.elementDecl(nanori, 3, ...) 236SAX.comment( 237 Japanese readings that are now only associated with names. 238 ) 239SAX.elementDecl(rmgroup, 4, ...) 240SAX.elementDecl(reading, 3, ...) 241SAX.comment( 242 The reading element contains the reading or pronunciation 243 of the kanji. 244 ) 245SAX.attributeDecl(reading, r_type, 1, 2, NULL, ...) 246SAX.comment( 247 The r_type attribute defines the type of reading in the reading 248 element. The current values are: 249 pinyin - the modern PinYin romanization of the Chinese reading 250 of the kanji. The tones are represented by a concluding 251 digit. [Y] 252 korean_r - the romanized form of the Korean reading(s) of the 253 kanji. The readings are in the (Republic of Korea) Ministry 254 of Education style of romanization. [W] 255 korean_h - the Korean reading(s) of the kanji in hangul. 256 ja_on - the "on" Japanese reading of the kanji, in katakana. A 257 second attribute r_status, if present, will indicate with 258 a value of "jy" whether the reading is approved for a 259 "Jouyou kanji". 260 ja_kun - the "kun" Japanese reading of the kanji, in hiragana. 261 Where relevant the okurigana is also included separated by a 262 ".". Readings associated with prefixes and suffixes are 263 marked with a "-". A second attribute r_status, if present, 264 will indicate with a value of "jy" whether the reading is 265 approved for a "Jouyou kanji". 266 ) 267SAX.attributeDecl(reading, r_status, 1, 3, NULL, ...) 268SAX.comment( 269 See under ja_on and ja_kun above. 270 ) 271SAX.elementDecl(meaning, 3, ...) 272SAX.comment( 273 The meaning associated with the kanji. 274 ) 275SAX.attributeDecl(meaning, m_lang, 1, 3, NULL, ...) 276SAX.comment( 277 The m_lang attribute defines the target language of the meaning. It 278 will be coded using the two-letter language code from the ISO 639 279 standard. When absent, the value "en" (i.e. English) is implied. [{}] 280 ) 281SAX.externalSubset(kanjidic2, , ) 282SAX.startElement(kanjidic2) 283SAX.characters( 284, 1) 285SAX.endElement(kanjidic2) 286SAX.endDocument() 287