This is free documentation; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
References consulted:
OpenGroup's Single Unix specification
http://www.UNIX-systems.org/online.html
"size_t iconv(iconv_t " cd , "const char **" inbuf , "size_t *" inbytesleft , "char **" outbuf , " "size_t *" outbytesleft );
For state-dependent encodings, the conversion descriptor cd is placed into its initial shift state by a call for which inbuf is a null pointer, or for which inbuf points to a null pointer. When iconv(\|) is called in this way, and if outbuf is not a null pointer or a pointer to a null pointer, and outbytesleft points to a positive value, iconv(\|) will place, into the output buffer, the byte sequence to change the output buffer to its initial shift state. If the output buffer is not large enough to hold the entire reset sequence, iconv(\|) will fail and set errno to \s-1E2BIG\s0 . Subsequent calls with inbuf as other than a null pointer or a pointer to a null pointer cause the conversion to take place from the current state of the conversion descriptor.
If a sequence of input bytes does not form a valid character in the specified charset, conversion stops after the previous successfully converted character. If the input buffer ends with an incomplete character or shift sequence, conversion stops after the previous successfully converted bytes. If the output buffer is not large enough to hold the entire converted input, conversion stops just prior to the input bytes that would cause the output buffer to overflow. The variable pointed to by inbuf is updated to point to the byte following the last byte successfully used in the conversion. The value pointed to by inbytesleft is decremented to reflect the number of bytes still not converted in the input buffer. The variable pointed to by outbuf is updated to point to the byte following the last byte of converted output data. The value pointed to by outbytesleft is decremented to reflect the number of bytes still available in the output buffer. For state-dependent encodings, the conversion descriptor is updated to reflect the shift state in effect at the end of the last successfully converted byte sequence.
If iconv(\|) encounters a character in the input buffer that is legal, but for which an identical character does not exist in the target charset, iconv(\|) performs an implementation-defined conversion on this character.
15 EILSEQ Input conversion stopped due to an input byte that does not belong to the input charset.
E2BIG Input conversion stopped due to lack of space in the output buffer.
EINVAL Input conversion stopped due to an incomplete character or shift sequence at the end of the input buffer.
The iconv(\|) function may fail if:
15 EBADF The cd argument is not a valid open conversion descriptor.
Regardless of the data type inferred by the converter, the size of the remaining space in both input and output objects (the intbytesleft " and " outbytesleft arguments) is always measured in bytes.
Coded character sets (CCS) Each CCS file contains tables for convertion between exactly one character of a corresponding charset and one UCS-4 character, and vice versa, a UCS-4 character to the character of the CCS charset. About 200 character sets are supported (only those used in FreeBSD distribution is provided in this package) including ASCII and the following standards: ISO-8859 ", " KOI8 ", " Windows ", " IBM-DOS ", " Macintosh ", " CJK national charsets and EBCDIC . CCS files are accessed via memory mapping.
8 Character encoding schemes (CES) Each CES module contains functions converting a byte sequence of a corresponding encoding scheme to exactly one UCS-4 32-bit character, and vice versa, a UCS-4 character to a byte sequence of the CES . The following CES groups are supported in the iconv-1.0 : ISO-10646 (UCS-4 and UCS-2, each in both architecture independent (network) and dependent (internal) byte order versions), Unicode (UTF-16, UTF-8 and UTF-7), ISO-2022 and Extended Unix Code (EUC) (both for Chinese " (" CN " and " TW ), " Japanese" and Korean languages). A special table-driven CES module providing conversion for all CCS tables is always built in into the library. ISO-2022 ", " EUC " and " table-driven modules use one or more memory-mapped CCS tables.
Any CCS table or CES module can be built in into the library at compilation time.
A CCS " or " CES charset can have zero or more aliases (alternative names) which are listed in charset.aliases file located in the same directory as CCS tables. The library maps the aliases file to memory to find canonical charset names.
If iconv(\|) encounters a character in the input buffer that is legal, but for which an identical character does not exist in the target charset, iconv(\|) replaces the source character with the " " ' _ ' (underscore) character and tries to convert it into the target charset. If there is no underscore character in the target charset, no bytes are written to the target buffer for the source character. In any case, iconv(\|) increments the number of non-identical conversions performed (the value being returned as the function result).
0
25 @@TABLE_DIR@@/charset.aliases Charset aliases file
25 @@TABLE_DIR@@/*.cct CCS conversion tables
25 @@MODULE_DIR@@/*.so CES conversion modules