xref: /dragonfly/lib/libc/locale/multibyte.3 (revision 2cd2d2b5)
1.\" Copyright (c) 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" This code is derived from software contributed to Berkeley by
5.\" Donn Seeley of BSDI.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\" 3. All advertising materials mentioning features or use of this software
16.\"    must display the following acknowledgement:
17.\"	This product includes software developed by the University of
18.\"	California, Berkeley and its contributors.
19.\" 4. Neither the name of the University nor the names of its contributors
20.\"    may be used to endorse or promote products derived from this software
21.\"    without specific prior written permission.
22.\"
23.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
24.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
25.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
26.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
27.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
28.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
29.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
30.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
31.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
32.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
33.\" SUCH DAMAGE.
34.\"
35.\"	@(#)multibyte.3	8.1 (Berkeley) 6/4/93
36.\" $FreeBSD: src/lib/libc/locale/multibyte.3,v 1.6.2.5 2001/12/14 18:33:54 ru Exp $
37.\" $DragonFly: src/lib/libc/locale/Attic/multibyte.3,v 1.2 2003/06/17 04:26:44 dillon Exp $
38.\"
39.Dd June 4, 1993
40.Dt MULTIBYTE 3
41.Os
42.Sh NAME
43.Nm mblen ,
44.Nm mbstowcs ,
45.Nm mbtowc ,
46.Nm wcstombs ,
47.Nm wctomb
48.Nd multibyte character support for C
49.Sh LIBRARY
50.Lb libc
51.Sh SYNOPSIS
52.In stdlib.h
53.Ft int
54.Fn mblen "const char *mbchar" "size_t nbytes"
55.Ft size_t
56.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars"
57.Ft int
58.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes"
59.Ft size_t
60.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes"
61.Ft int
62.Fn wctomb "char *mbchar" "wchar_t wchar"
63.Sh DESCRIPTION
64The basic elements of some written natural languages such as Chinese
65cannot be represented uniquely with single C
66.Va char Ns s .
67The C standard supports two different ways of dealing with
68extended natural language encodings,
69.Em wide
70characters and
71.Em multibyte
72characters.
73Wide characters are an internal representation
74which allows each basic element to map
75to a single object of type
76.Va wchar_t .
77Multibyte characters are used for input and output
78and code each basic element as a sequence of C
79.Va char Ns s .
80Individual basic elements may map into one or more
81(up to
82.Dv MB_CHAR_MAX )
83bytes in a multibyte character.
84.Pp
85The current locale
86.Pq Xr setlocale 3
87governs the interpretation of wide and multibyte characters.
88The locale category
89.Dv LC_CTYPE
90specifically controls this interpretation.
91The
92.Va wchar_t
93type is wide enough to hold the largest value
94in the wide character representations for all locales.
95.Pp
96Multibyte strings may contain
97.Sq shift
98indicators to switch to and from
99particular modes within the given representation.
100If explicit bytes are used to signal shifting,
101these are not recognized as separate characters
102but are lumped with a neighboring character.
103There is always a distinguished
104.Sq initial
105shift state.
106The
107.Fn mbstowcs
108and
109.Fn wcstombs
110functions assume that multibyte strings are interpreted
111starting from the initial shift state.
112The
113.Fn mblen ,
114.Fn mbtowc
115and
116.Fn wctomb
117functions maintain static shift state internally.
118A call with a null
119.Fa mbchar
120pointer returns nonzero if the current locale requires shift states,
121zero otherwise;
122if shift states are required, the shift state is reset to the initial state.
123The internal shift states are undefined after a call to
124.Fn setlocale
125with the
126.Dv LC_CTYPE
127or
128.Dv LC_ALL
129categories.
130.Pp
131For convenience in processing,
132the wide character with value 0
133(the null wide character)
134is recognized as the wide character string terminator,
135and the character with value 0
136(the null byte)
137is recognized as the multibyte character string terminator.
138Null bytes are not permitted within multibyte characters.
139.Pp
140The
141.Fn mblen
142function computes the length in bytes
143of a multibyte character
144.Fa mbchar .
145Up to
146.Fa nbytes
147bytes are examined.
148.Pp
149The
150.Fn mbtowc
151function converts a multibyte character
152.Fa mbchar
153into a wide character and stores the result
154in the object pointed to by
155.Fa wcharp .
156Up to
157.Fa nbytes
158bytes are examined.
159.Pp
160The
161.Fn wctomb
162function converts a wide character
163.Fa wchar
164into a multibyte character and stores
165the result in
166.Fa mbchar .
167The object pointed to by
168.Fa mbchar
169must be large enough to accommodate the multibyte character.
170.Pp
171The
172.Fn mbstowcs
173function converts a multibyte character string
174.Fa mbstring
175into a wide character string
176.Fa wcstring .
177No more than
178.Fa nwchars
179wide characters are stored.
180A terminating null wide character is appended if there is room.
181.Pp
182The
183.Fn wcstombs
184function converts a wide character string
185.Fa wcstring
186into a multibyte character string
187.Fa mbstring .
188Up to
189.Fa nbytes
190bytes are stored in
191.Fa mbstring .
192Partial multibyte characters at the end of the string are not stored.
193The multibyte character string is null terminated if there is room.
194.Sh "RETURN VALUES
195If multibyte characters are not supported in the current locale,
196all of these functions will return \-1 if characters can be processed,
197otherwise 0.
198.Pp
199If
200.Fa mbchar
201is
202.Dv NULL ,
203the
204.Fn mblen ,
205.Fn mbtowc
206and
207.Fn wctomb
208functions return nonzero if shift states are supported,
209zero otherwise.
210If
211.Fa mbchar
212is valid,
213then these functions return
214the number of bytes processed in
215.Fa mbchar ,
216or \-1 if no multibyte character
217could be recognized or converted.
218.Pp
219The
220.Fn mbstowcs
221function returns the number of wide characters converted,
222not counting any terminating null wide character.
223The
224.Fn wcstombs
225function returns the number of bytes converted,
226not counting any terminating null byte.
227If any invalid multibyte characters are encountered,
228both functions return \-1.
229.Sh "SEE ALSO
230.Xr mbrune 3 ,
231.Xr rune 3 ,
232.Xr setlocale 3 ,
233.Xr euc 4 ,
234.Xr utf2 4
235.Sh STANDARDS
236The
237.Fn mblen ,
238.Fn mbstowcs ,
239.Fn mbtowc ,
240.Fn wcstombs
241and
242.Fn wctomb
243functions conform to
244.St -isoC .
245.Sh BUGS
246The current implementation does not support shift states.
247