xref: /original-bsd/lib/libc/locale/multibyte.3 (revision 49a3a6ff)
1.\" Copyright (c) 1993 The Regents of the University of California.
2.\" All rights reserved.
3.\"
4.\" This code is derived from software contributed to Berkeley by
5.\" Donn Seeley of BSDI.
6.\"
7.\" %sccs.include.redist.roff%
8.\"
9.\"	@(#)multibyte.3	5.1 (Berkeley) 03/02/93
10.\"
11.Dd ""
12.Dt MULTIBYTE 3
13.Os
14.Sh NAME
15.Nm mblen ,
16.Nm mbstowcs ,
17.Nm mbtowc ,
18.Nm wcstombs ,
19.Nm wctomb
20.Nd multibyte character support for C
21.Sh SYNOPSIS
22.Fd #include <stdlib.h>
23.Ft int
24.Fn mblen "const char *mbchar" "int nbytes"
25.Ft size_t
26.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars"
27.Ft int
28.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes"
29.Ft size_t
30.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes"
31.Ft int
32.Fn wctomb "char *mbchar" "wchar_t wchar"
33.Sh DESCRIPTION
34The basic elements of some written natural languages such as Chinese
35cannot be represented uniquely with single C
36.Va char Ns s .
37The C standard supports two different ways of dealing with
38extended natural language encodings,
39.Em wide
40characters and
41.Em multibyte
42characters.
43Wide characters are an internal representation
44which allows each basic element to map
45to a single object of type
46.Va wchar_t .
47Multibyte characters are used for input and output
48and code each basic element as a sequence of C
49.Va char Ns s .
50Individual basic elements may map into one or more
51.Pq up to Dv MB_CHAR_MAX
52bytes in a multibyte character.
53.Pp
54The current locale
55.Pq Xr setlocale 3
56governs the interpretation of wide and multibyte characters.
57The locale category
58.Dv LC_CTYPE
59specifically controls this interpretation.
60The
61.Va wchar_t
62type is wide enough to hold the largest value
63in the wide character representations for all locales.
64.Pp
65Multibyte strings may contain
66.Sq shift
67indicators to switch to and from
68particular modes within the given representation.
69If explicit bytes are used to signal shifting,
70these are not recognized as separate characters
71but are lumped with a neighboring character.
72There is always a distinguished
73.Sq initial
74shift state.
75The
76.Fn mbstowcs
77and
78.Fn wcstombs
79functions assume that multibyte strings are interpreted
80starting from the initial shift state.
81The
82.Fn mblen ,
83.Fn mbtowc
84and
85.Fn wctomb
86functions maintain static shift state internally.
87A call with a null
88.Fa mbchar
89pointer returns nonzero if the current locale requires shift states,
90zero otherwise;
91if shift states are required, the shift state is reset to the initial state.
92The internal shift states are undefined after a call to
93.Fn setlocale
94with the
95.Dv LC_CTYPE
96or
97.Dv LC_ALL
98categories.
99.Pp
100For convenience in processing,
101the wide character with value 0
102.Pq the null wide character
103is recognized as the wide character string terminator,
104and the character with value 0
105.Pq the null byte
106is recognized as the multibyte character string terminator.
107Null bytes are not permitted within multibyte characters.
108.Pp
109The
110.Fn mblen
111function computes the length in bytes
112of a multibyte character
113.Fa mbchar .
114Up to
115.Fa nbytes
116bytes are examined.
117.Pp
118The
119.Fn mbtowc
120function converts a multibyte character
121.Fa mbchar
122into a wide character and stores the result
123in the object pointed to by
124.Fa wcharp.
125Up to
126.Fa nbytes
127bytes are examined.
128.Pp
129The
130.Fn wctomb
131function converts a wide character
132.Fa wchar
133into a multibyte character and stores
134the result in
135.Fa mbchar .
136The object pointed to by
137.Fa mbchar
138must be large enough to accommodate the multibyte character.
139.Pp
140The
141.Fn mbstowcs
142function converts a multibyte character string
143.Fa mbstring
144into a wide character string
145.Fa wcstring .
146No more than
147.Fa nwchars
148wide characters are stored.
149A terminating null wide character is appended if there is room.
150.Pp
151The
152.Fn wcstombs
153function converts a wide character string
154.Fa wcstring
155into a multibyte character string
156.Fa mbstring .
157Up to
158.Fa nbytes
159bytes are stored in
160.Fa mbstring .
161Partial multibyte characters at the end of the string are not stored.
162The multibyte character string is null terminated if there is room.
163.Sh "RETURN VALUES
164If multibyte characters are not supported in the current locale,
165all of these functions will return \-1 if characters can be processed,
166otherwise 0.
167.Pp
168If
169.Fa mbchar
170is
171.Dv NULL ,
172the
173.Fn mblen ,
174.Fn mbtowc
175and
176.Fn wctomb
177functions return nonzero if shift states are supported,
178zero otherwise.
179If
180.Fa mbchar
181is valid,
182then these functions return
183the number of bytes processed in
184.Fa mbchar ,
185or \-1 if no multibyte character
186could be recognized or converted.
187.Pp
188The
189.Fn mbstowcs
190function returns the number of wide characters converted,
191not counting any terminating null wide character.
192The
193.Fn wcstombs
194function returns the number of bytes converted,
195not counting any terminating null byte.
196If any invalid multibyte characters are encountered,
197both functions return \-1.
198.Sh "SEE ALSO
199.Xr setlocale 3
200.Sh STANDARDS
201The
202.Fn mblen ,
203.Fn mbstowcs ,
204.Fn mbtowc ,
205.Fn wcstombs
206and
207.Fn wctomb
208functions conform to
209.St -ansiC .
210.Sh HISTORY
211The
212.Fn mblen ,
213.Fn mbstowcs ,
214.Fn mbtowc ,
215.Fn wcstombs
216and
217.Fn wctomb
218functions are
219.Ud
220.Sh BUGS
221The current implementation supports only the
222.Li "\&""C""
223locale.
224No multibyte or wide character encodings are recognized.
225