1.\" Copyright (c) 1993 2.\" The Regents of the University of California. All rights reserved. 3.\" 4.\" This code is derived from software contributed to Berkeley by 5.\" Donn Seeley of BSDI. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 3. All advertising materials mentioning features or use of this software 16.\" must display the following acknowledgement: 17.\" This product includes software developed by the University of 18.\" California, Berkeley and its contributors. 19.\" 4. Neither the name of the University nor the names of its contributors 20.\" may be used to endorse or promote products derived from this software 21.\" without specific prior written permission. 22.\" 23.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 24.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 25.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 26.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 27.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 28.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 29.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 30.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 31.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 32.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 33.\" SUCH DAMAGE. 34.\" 35.\" @(#)multibyte.3 8.1 (Berkeley) 6/4/93 36.\" $FreeBSD: src/lib/libc/locale/multibyte.3,v 1.6.2.5 2001/12/14 18:33:54 ru Exp $ 37.\" $DragonFly: src/lib/libc/locale/Attic/multibyte.3,v 1.2 2003/06/17 04:26:44 dillon Exp $ 38.\" 39.Dd June 4, 1993 40.Dt MULTIBYTE 3 41.Os 42.Sh NAME 43.Nm mblen , 44.Nm mbstowcs , 45.Nm mbtowc , 46.Nm wcstombs , 47.Nm wctomb 48.Nd multibyte character support for C 49.Sh LIBRARY 50.Lb libc 51.Sh SYNOPSIS 52.In stdlib.h 53.Ft int 54.Fn mblen "const char *mbchar" "size_t nbytes" 55.Ft size_t 56.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars" 57.Ft int 58.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes" 59.Ft size_t 60.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes" 61.Ft int 62.Fn wctomb "char *mbchar" "wchar_t wchar" 63.Sh DESCRIPTION 64The basic elements of some written natural languages such as Chinese 65cannot be represented uniquely with single C 66.Va char Ns s . 67The C standard supports two different ways of dealing with 68extended natural language encodings, 69.Em wide 70characters and 71.Em multibyte 72characters. 73Wide characters are an internal representation 74which allows each basic element to map 75to a single object of type 76.Va wchar_t . 77Multibyte characters are used for input and output 78and code each basic element as a sequence of C 79.Va char Ns s . 80Individual basic elements may map into one or more 81(up to 82.Dv MB_CHAR_MAX ) 83bytes in a multibyte character. 84.Pp 85The current locale 86.Pq Xr setlocale 3 87governs the interpretation of wide and multibyte characters. 88The locale category 89.Dv LC_CTYPE 90specifically controls this interpretation. 91The 92.Va wchar_t 93type is wide enough to hold the largest value 94in the wide character representations for all locales. 95.Pp 96Multibyte strings may contain 97.Sq shift 98indicators to switch to and from 99particular modes within the given representation. 100If explicit bytes are used to signal shifting, 101these are not recognized as separate characters 102but are lumped with a neighboring character. 103There is always a distinguished 104.Sq initial 105shift state. 106The 107.Fn mbstowcs 108and 109.Fn wcstombs 110functions assume that multibyte strings are interpreted 111starting from the initial shift state. 112The 113.Fn mblen , 114.Fn mbtowc 115and 116.Fn wctomb 117functions maintain static shift state internally. 118A call with a null 119.Fa mbchar 120pointer returns nonzero if the current locale requires shift states, 121zero otherwise; 122if shift states are required, the shift state is reset to the initial state. 123The internal shift states are undefined after a call to 124.Fn setlocale 125with the 126.Dv LC_CTYPE 127or 128.Dv LC_ALL 129categories. 130.Pp 131For convenience in processing, 132the wide character with value 0 133(the null wide character) 134is recognized as the wide character string terminator, 135and the character with value 0 136(the null byte) 137is recognized as the multibyte character string terminator. 138Null bytes are not permitted within multibyte characters. 139.Pp 140The 141.Fn mblen 142function computes the length in bytes 143of a multibyte character 144.Fa mbchar . 145Up to 146.Fa nbytes 147bytes are examined. 148.Pp 149The 150.Fn mbtowc 151function converts a multibyte character 152.Fa mbchar 153into a wide character and stores the result 154in the object pointed to by 155.Fa wcharp . 156Up to 157.Fa nbytes 158bytes are examined. 159.Pp 160The 161.Fn wctomb 162function converts a wide character 163.Fa wchar 164into a multibyte character and stores 165the result in 166.Fa mbchar . 167The object pointed to by 168.Fa mbchar 169must be large enough to accommodate the multibyte character. 170.Pp 171The 172.Fn mbstowcs 173function converts a multibyte character string 174.Fa mbstring 175into a wide character string 176.Fa wcstring . 177No more than 178.Fa nwchars 179wide characters are stored. 180A terminating null wide character is appended if there is room. 181.Pp 182The 183.Fn wcstombs 184function converts a wide character string 185.Fa wcstring 186into a multibyte character string 187.Fa mbstring . 188Up to 189.Fa nbytes 190bytes are stored in 191.Fa mbstring . 192Partial multibyte characters at the end of the string are not stored. 193The multibyte character string is null terminated if there is room. 194.Sh "RETURN VALUES 195If multibyte characters are not supported in the current locale, 196all of these functions will return \-1 if characters can be processed, 197otherwise 0. 198.Pp 199If 200.Fa mbchar 201is 202.Dv NULL , 203the 204.Fn mblen , 205.Fn mbtowc 206and 207.Fn wctomb 208functions return nonzero if shift states are supported, 209zero otherwise. 210If 211.Fa mbchar 212is valid, 213then these functions return 214the number of bytes processed in 215.Fa mbchar , 216or \-1 if no multibyte character 217could be recognized or converted. 218.Pp 219The 220.Fn mbstowcs 221function returns the number of wide characters converted, 222not counting any terminating null wide character. 223The 224.Fn wcstombs 225function returns the number of bytes converted, 226not counting any terminating null byte. 227If any invalid multibyte characters are encountered, 228both functions return \-1. 229.Sh "SEE ALSO 230.Xr mbrune 3 , 231.Xr rune 3 , 232.Xr setlocale 3 , 233.Xr euc 4 , 234.Xr utf2 4 235.Sh STANDARDS 236The 237.Fn mblen , 238.Fn mbstowcs , 239.Fn mbtowc , 240.Fn wcstombs 241and 242.Fn wctomb 243functions conform to 244.St -isoC . 245.Sh BUGS 246The current implementation does not support shift states. 247