iconv-2.0/lib/iconv.3.in

     Copyright (c) Konstantin Chuguev <Konstantin.Chuguev@dante.org.uk>

 This is free documentation; you can redistribute it and/or
 modify it under the terms of the GNU General Public License as
 published by the Free Software Foundation; either version 2 of
 the License, or (at your option) any later version.

 References consulted:
 OpenGroup's Single Unix specification
 http://www.UNIX-systems.org/online.html

 iconv 3 "7 Sep 2000"
 NAME
iconv - charset conversion function

 SYNOPSIS
 "#include <iconv.h>"
 "size_t iconv(iconv_t " cd ,  "const char **" inbuf ,  "size_t *" inbytesleft ,  "char **" outbuf , "  "size_t *" outbytesleft );
 DESCRIPTION
 "iconv" "" "\fLiconv \(em charset conversion function"  "charset conversion function" "" "charset conversion function \(em \fLiconv" The
 iconv(\|) function converts the sequence of characters from one charset,
in the array specified by
 inbuf , into a sequence of corresponding characters in another charset,
in the array specified by
 outbuf . The charsets are those specified in the
 iconv_open (\|) call that returned the conversion descriptor,
 cd . The
 inbuf argument points to a variable that
points to the first character in the input buffer and
 inbytesleft indicates the number of bytes to the end of the buffer to be converted.
The
 outbuf argument points to a variable that
points to the first available byte in the output buffer and
 outbytesleft indicates the number of the available bytes to the end of the buffer.

For state-dependent encodings, the conversion descriptor
 cd is placed into its initial shift state
by a call for which
 inbuf is a null pointer, or for which
 inbuf points to a null pointer.
When
 iconv(\|) is called in this way, and if
 outbuf is not a null pointer or a pointer to a null pointer, and
 outbytesleft points to a positive value,
 iconv(\|) will place, into the output buffer,
the byte sequence to change the output buffer to its initial
shift state. If the output buffer is not large enough to hold the entire
reset sequence,
 iconv(\|) will fail and set
 errno to
 \s-1E2BIG\s0 . Subsequent calls with
 inbuf as other than a null pointer or a pointer to a null pointer cause the
conversion to take place from the current state of the conversion descriptor.

If a sequence of input bytes does not form a valid character
in the specified charset, conversion stops
after the previous successfully converted character.
If the input buffer ends with an incomplete character or shift sequence,
conversion stops after the previous successfully converted bytes.
If the output buffer is not large enough to hold the entire converted
input, conversion stops just prior to the input bytes that would cause the
output buffer to overflow.
The variable pointed to by
 inbuf is updated to point to the byte following the last byte successfully
used in the conversion. The
value pointed to by
 inbytesleft is decremented to reflect the number of bytes still not converted in
the input buffer.
The variable pointed to by
 outbuf is updated to point to the byte following the last byte of converted
output data.
The value pointed to by
 outbytesleft is decremented to reflect the number of bytes still available in the
output buffer.
For state-dependent encodings, the conversion descriptor is updated
to reflect the shift state in effect at the end of the last
successfully converted byte sequence.

If
 iconv(\|) encounters a character in the input buffer that is legal, but for which an
identical character does not exist in the target charset,
 iconv(\|) performs an implementation-defined conversion on this character.

 RETURN VALUES
The
 iconv(\|) function updates the variables pointed to by the arguments to reflect the
extent of the conversion and returns the number of non-identical conversions
performed. If the entire string in the input buffer is converted, the value
pointed to by
 inbytesleft will be
 0 . If the input conversion is stopped due to any conditions
mentioned above, the value pointed to by
 inbytesleft will be non-zero and
 errno is set to indicate the condition. If an error occurs
 iconv(\|) returns
 (size_t) -1 and sets
 errno to indicate the error.

 ERRORS
The
 iconv(\|) function will fail if:
 15
 EILSEQ Input conversion stopped due to an input byte that does not belong to the
input charset.

 E2BIG Input conversion stopped due to lack of space in the output buffer.

 EINVAL Input conversion stopped due to an incomplete
character or shift sequence at the end of the input buffer.

The
 iconv(\|) function may fail if:
 15
 EBADF The
 cd argument is not a valid open conversion descriptor.

 APPLICATION USAGE
The
 inbuf argument indirectly points to the memory area which contains the conversion
input data. The
 outbuf argument indirectly points to the memory area which is to contain the result
of the conversion. The objects indirectly pointed to by
 inbuf " and " outbuf are not restricted to containing data that is directly representable
in the ISO C language
 char data type. The type of
 inbuf " and " outbuf ,  "char **" , does not imply that the objects pointed to are interpreted as
null-terminated C strings or arrays of characters. Any interpretation
of a byte sequence that represents a character in a given character set
encoding scheme is done internally within the codeset converters.
For example, the area pointed to indirectly by
 inbuf " and/or " outbuf can contain all zero octets that are not interpreted as string terminators
but as coded character data according to the respective codeset encoding
scheme. The type of the data
 " " ( char ", " "short int" ", " "long int" ", " and so on) read or stored in the objects is not specified, but may be
inferred for both the input and output data by the converters determined
by the
 from_charset " and " to_charset arguments of
 iconv_open(\|) .
Regardless of the data type inferred by the converter, the size
of the remaining space in both input and output objects (the
 intbytesleft " and " outbytesleft arguments) is always measured in bytes.

 IMPLEMENTATION DETAILS
Conversions between different charsets are done via the
 UCS-4 universal character set. Conversions between the same charset (e.g.
when two different aliases of the same charset are used) are done
by direct copying from the input buffer to the output one. The
 libiconv library itself usually contains only a small set of (built-in) charsets.
Tables for conversion between
 UCS-4 and particular charsets are mapped to memory from binary table files,
or C methods are loaded dynamically from shared modules:

 Coded character sets (CCS) Each
 CCS file contains tables for convertion between exactly one character of a
corresponding charset and one
 UCS-4 character, and vice versa, a
 UCS-4 character to the character of the
 CCS charset. About 200 character sets are supported (only those used
in
 FreeBSD distribution is provided in this package) including
 ASCII and the following standards:
 ISO-8859 ", " KOI8 ", " Windows ", " IBM-DOS ", " Macintosh ", " CJK national charsets and
 EBCDIC .  CCS files are accessed via memory mapping.
 8
 Character encoding schemes (CES) Each
 CES module contains functions converting a byte sequence of a corresponding
encoding scheme to exactly one
 UCS-4 32-bit character, and vice versa, a
 UCS-4 character to a byte sequence of the
 CES . The following
 CES groups are supported in the
 iconv-1.0 :  ISO-10646 (UCS-4 and UCS-2, each in both architecture independent (network)
and dependent (internal) byte order versions),
 Unicode (UTF-16, UTF-8 and UTF-7),
 ISO-2022 and
 Extended Unix Code (EUC) (both for
 Chinese " (" CN " and " TW ), " Japanese" and
 Korean languages). A special
 table-driven  CES module providing conversion for all
 CCS tables is always built in into the library.
 ISO-2022 ", " EUC " and " table-driven modules use one or more memory-mapped
 CCS tables.

Any
 CCS table or
 CES module can be built in into the library at compilation time.

A
 CCS " or " CES charset can have zero or more aliases (alternative names) which
are listed in
 charset.aliases file located in the same directory as
 CCS tables. The library maps the aliases file to memory to find canonical
charset names.

If
 iconv(\|) encounters a character in the input buffer that is legal, but for which an
identical character does not exist in the target charset,
 iconv(\|) replaces the source character with the
 " " ' _ ' (underscore) character and tries to convert it into the target charset.
If there is no underscore character in the target charset, no bytes are
written to the target buffer for the source character. In any case,
 iconv(\|) increments the number of non-identical conversions performed (the value
being returned as the function result).

 FILES
 0
 25
 @@TABLE_DIR@@/charset.aliases Charset aliases file
 25
 @@TABLE_DIR@@/*.cct  CCS conversion tables
 25
 @@MODULE_DIR@@/*.so  CES conversion modules


 SEE ALSO
 iconv (1),  iconv_close (3),  iconv_open (3)