1package Encode::JIS2K; 2our $VERSION = sprintf "%d.%02d", q$Revision: 0.03 $ =~ /(\d+)/g; 3 4use Encode; 5use XSLoader; 6XSLoader::load(__PACKAGE__,$VERSION); 7 8use Encode::JIS2K::2022JP3; 9 10Encode::define_alias(qr/\beuc.*jp[ \-]?(?:2000|2k)$/i => '"euc-jisx0213"'); 11Encode::define_alias(qr/\bjp.*euc[ \-]?(2000|2k)$/i => '"euc-jisx0213"'); 12Encode::define_alias(qr/\bujis[ \-]?(?:2000|2k)$/i => '"euc-jisx0213"'); 13 14Encode::define_alias(qr/\bshift.*jis(?:2000|2k)$/i => '"shiftjisx0213"'); 15Encode::define_alias(qr/\bsjisp \-]?(?:2000|2k)$/i => '"shiftjisx0213"'); 16 17 181; 19__END__ 20 21=head1 NAME 22 23Encode::JIS2K - JIS X 0212 (aka JIS 2000) Encodings 24 25=head1 SYNOPSIS 26 27 use Encode::JIS2K; 28 use Encode qw/encode decode/; 29 $euc_2k = encode("euc-jisx0213", $utf8); 30 $utf8 = decode("euc-jisx0213", $euc_jp); 31 32=head1 ABSTRACT 33 34This module implements encodings that covers JIS X 0213 charset (AKA 35JIS 2000, hence the module name). Encodings supported are as follows. 36 37 Canonical Alias Description 38 -------------------------------------------------------------------- 39 euc-jisx0213 qr/\beuc.*jp[ \-]?(?:2000|2k)$/i EUC-JISX0213 40 qr/\bjp.*euc[ \-]?(2000|2k)$/i 41 qr/\bujis[ \-]?(?:2000|2k)$/i 42 shiftjisx0123 qr/\bshift.*jis(?:2000|2k)$/i Shift_JISX0213 43 qr/\bsjisp \-]?(?:2000|2k)$/i 44 45 iso-2022-jp-3 46 jis0213-1-raw JIS X 0213 plane 1, raw format 47 jis0213-2-raw JIS X 0213 plane 2, raw format 48 -------------------------------------------------------------------- 49 50=head1 DESCRIPTION 51 52To find out how to use this module in detail, see L<Encode>. 53 54=head1 what is JIS X 0213 anyway? 55 56Simply put, JIS X 0213 is a rework and reorganization of JIS X 0208 57and JIS X 0212. They consist of two 94x94 planes which roughly 58corrensponds as follows; 59 60 JIS X 0213 Plane 1 = JIS X 0208 + extension 61 JIS X 0213 Plane 2 = JIS X 0212 reorganized + extension 62 63And here is the character repertoire there of at a glance. 64 65 # of codepoints Kuten Ku (rows) used 66 -------------------------------------------------------- 67 JIS X 0208 6,879 1..8,16..83 68 JIS X 0213-1 8,762 1..94 (all!) 69 JIS X 0212 6,067 2,6..7,9..11,16..77 70 JIS X 0213-2 2,436 1,3..5,8,12..15,78..94 71 ------------------------------------------------------- 72 (JIS X0213 Total) 11,197 73 74JIS X 0213 was designed to extend JIS X 0208 and JIS X 0212 without 75being imcompatible to (classic) EUC-JP and Shift_JIS. The following 76characteristics are as a result thereof. 77 78=over 2 79 80=item * 81 82JIS X plane 1 is (almost) a superset of JIS X 0208. However, with 83Unicode 3.2.0 the mappings differ in 3 codepoints. 84 85 Kuten JIS X 0208 -> Unicode JIS X 0213 -> Unicode 86 -------------------------------------------------------------- 87 1-1-17 <UFFE3> # FULLWIDTH MACRON <U203E> # OVERLINE 88 1-1-29 <U2014> # EM DASH <U2015> # HORIZONTAL BAR 89 1-1-79 <UFFE5> # FULLWIDTH YEN SIGN <U00A5> # YEN SIGN 90 91=item * 92 93By the same token, JIS X 0213 plane 2 contains JIS Dai-4 Suijun Kanji 94(JIS Kanji Repertoire Level 4). This allows EUC-JP's G3 to contain 95both JIS X 0212 and JIS 0213 plane 2. 96 97However, JIS X 0212:1990 already contains many of Dai-4 Suijun Kanji 98so EUC's G3 is subject to containing duplicate mappings. 99 100=item * 101 102Because of Halfwidth Katakana, Shift_JIS mapping has been tricky and 103it is even trickier. Here is a regex that matches Shift_JISX0213 104sequence (note: you have to "use bytes" to make it work!) 105 106 $re_valid_shifjisx0213 = 107 qr/^(?: 108 [\x00-\x7f] | # ASCII or 109 [\xa1-\xdf] | # JIS X 0201 KANA or 110 [\x81-\x9f\xe0-\xfc][\x40-\x7e\x80-\xfc] # JIS X 0213 111 )+$/xo; 112 113=back 114 115=head2 Note on EUC-JISX0213 (vs. EUC-JP) 116 117As of Encode-1.64, 'euc-jp' does support euc-jisx0213 for decoding. 118However, 'euc-jp' in Encode and 'euc-jisx0213' differ as follows; 119 120 euc-jp euc-jisx0213 121 -------------------------------------------------------------- 122 Decodes.... (0201-K|0208|0212|0213) ditto 123 Round-Trip (|0) (020-K|0208|0212) JIS X (0201-K|0213) 124 Decode Only (|3) those only found in 0213 125 those only found in 0212 126 -------------------------------------------------------------- 127 128=head1 AUTHORS 129 130Dan Kogai E<lt>dankogai@dan.co.jpE<gt> 131 132=head1 COPYRIGHT 133 134Copyright 2002 by Dan Kogai E<lt>dankogai@dan.co.jpE<gt>. 135 136This program is free software; you can redistribute it and/or 137modify it under the same terms as Perl itself. 138 139See L<http://www.perl.com/perl/misc/Artistic.html> 140 141=head1 SEE ALSO 142 143L<Encode>, L<Encode::JP> 144 145Japanese Graphic Character Set for Information Interchange -- Plane 1 146L<http://www.itscj.ipsj.or.jp/ISO-IR/228.pdf> 147 148Japanese Graphic Character Set for Information Interchange -- Plane 2 149L<http://www.itscj.ipsj.or.jp/ISO-IR/229.pdf> 150 151=cut 152