• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..11-Oct-2012-

ccs/H11-Oct-2012-90,57890,167

ces/H11-Oct-2012-1,294831

lib/H11-Oct-2012-3,3272,261

AUTHORSH A D11-Oct-201253 21

COPYINGH A D11-Oct-20121.4 KiB2827

Makefile.amH A D11-Oct-2012650 4022

Makefile.inH A D11-Oct-201210.6 KiB383297

README.ORIGINALH A D11-Oct-20123.3 KiB6964

README.TODOH A D11-Oct-20121.7 KiB3326

charset.aliasesH A D11-Oct-20121.6 KiB3938

iconv.texH A D11-Oct-201211.6 KiB433371

README.ORIGINAL

1	ICONV - Charset Conversion Library. Version 2.0
2	-----------------------------------------------
3
4This distribution provides:
5	* the library (libiconv.a and .so) for conversion between
6	  various charsets (character encoding schemes);
7	* and the command line utility (iconv), providing
8	  conversion of a file, standard input or its argument
9	  line from one charset to another;
10	* a set of coded character set tables (binary files) and
11	  character encoding schemes (dynamically loaded modules)
12	  for use by the library;
13	* a utility for creating character set tables from Unicode
14	  conversion tables and RFC1345-style charset descriptions.
15
16Syntax of the library functions (iconv_open, iconv, iconv_close)
17and the utility is described in the man pages.
18
19Features of the library:
20- Coded character set (CCS) tables are binary files containing
21  pairs of tables for converting characters from some charset to
22  Unicode (UCS-2 in host byte order) and vice versa. There are 4
23  types of tables supported in iconv-2.0: for 7-bit, 8-bit, 14-bit
24  and 16-bit charsets. The library uses memory mapping (in
25  read-only mode) to access the table data.
26- Character encoding schemes (CES) are small sets of C structures
27  and functions. The functions implement virtual methods for
28  converting a sequence of characters in some charset to a Unicode
29  character (UCS-4 in host byte order). Each encoding scheme is
30  located in a separate C file and can be compiled to a dynamically
31  loaded shared module.
32- A universal CES for all table driven charsets is compiled into
33  the library and used for all CCS tables.
34- Both CCS tables and CES C code can be built into the library by
35  specifying the corresponding charset name in the
36  ICONV_BUILTIN_CHARSETS make variable. By default us-ascii, utf-8
37  and ucs-4-internal are built in (plus the CES for all CCS
38  tables). All the CES modules are included to a static version of
39  the library (libiconv.a).
40- Multiple aliases for every charset are supported. All aliases are
41  listed in the charset.aliases file(s). The library uses memory
42  mapping to parse alias information and find a canonical name
43  of a charset before looking it up in the internal list or
44  external table or shared module. Alias information can also be
45  compiled into the library (which is useful for compiled-in
46  charsets ;-)
47- ISO/IEC 10646 conformance of the internal representation of
48  characters; conversion is done in two steps:
49  (1) a sequence of zero or more bytes from input buffer coded in
50      the source charset is converted to exactly one valid UCS-4
51      character and
52  (2) the UCS-4 character is converted to a sequence of zero or
53      more bytes in the target charset to the output buffer.
54  In the case when two charset names are found to be aliases
55  of the same charset, conversion is done via a simplified
56  converter by copying the data from the input buffer to the
57  output one.
58- Open module API: adding new modules is easy. API has only been
59  documented via iconv.h file comments so far. A perl utility is
60  provided for conversion of Unicode charset tables
61  (http://www.unicode.org/Public/MAPPINGS/) and RFC1345-style
62  charset tables into the CCS format recognized by the library.
63- API conformance to Unix98 specification.
64- BSD-style copyright.
65
66				Konstantin Chuguev
67				<Konstantin.Chuguev@dante.org.uk>
68				November 2000.
69

README.TODO

11. newlib/iconv/ccs/iconv_mktbl Perl script should be upgraded. Currently
2   this script can only generate Big Endian (Network Byte Order) .cct files.
3   This decreases conversion performance on little endian systems since iconv
4   library needs to swap all bytes that it reads from loaded CCS table.
5   Something such -LE and -BE options should be added to 'iconv_mktbl' script.
6
7   Also, we can keep two .cct file versions - BE and LE (e.g., koi8_r-le.cct
8   and koi8_r-be.cct) and iconv library will automatically choose needed cct.
9
10   Or we can keep both LE and BE data in one .cct file.
11
122. http://www.dante.net/staff/konstantin/FreeBSD/iconv/ contains additional
13   CES and CCS converters (see iconv-extra-2.0.tar.gz and
14   iconv-rfc1345-2.0.tar.gz). These extra converters should be added too.
15
163. Documentation should be created. It should contain:
17   1) How to compile iconv (configure script options description)
18   2) How to add new converter
19   3) Work principles.
20   It would be nice if iconv architecture will be described too.
21
224. CCS files loading (iconv/lib/loaddata.c). Now file is loaded into memory
23   for every iconv descriptor. For example, if one use two iconv descriptors
24   for UTF8->KOI8-R and UTF-16->KOI8-R conversions, koi8_r.cct file will
25   be loaded twice. To save memory, we should load each .cct only once
26   (if possible).
27
28                                             Artem B. Bityuckiy,
29                                             SoftMine Corporation,
30                                             <abitytsky@softminecorp.com>,
31                                             <dedekind@mail.ru>,
32                                             Jan, 2004.
33