1.\" Copyright (c) 1993 The Regents of the University of California. 2.\" All rights reserved. 3.\" 4.\" This code is derived from software contributed to Berkeley by 5.\" Donn Seeley of BSDI. 6.\" 7.\" %sccs.include.redist.roff% 8.\" 9.\" @(#)multibyte.3 5.1 (Berkeley) 03/02/93 10.\" 11.Dd "" 12.Dt MULTIBYTE 3 13.Os 14.Sh NAME 15.Nm mblen , 16.Nm mbstowcs , 17.Nm mbtowc , 18.Nm wcstombs , 19.Nm wctomb 20.Nd multibyte character support for C 21.Sh SYNOPSIS 22.Fd #include <stdlib.h> 23.Ft int 24.Fn mblen "const char *mbchar" "int nbytes" 25.Ft size_t 26.Fn mbstowcs "wchar_t *wcstring" "const char *mbstring" "size_t nwchars" 27.Ft int 28.Fn mbtowc "wchar_t *wcharp" "const char *mbchar" "size_t nbytes" 29.Ft size_t 30.Fn wcstombs "char *mbstring" "const wchar_t *wcstring" "size_t nbytes" 31.Ft int 32.Fn wctomb "char *mbchar" "wchar_t wchar" 33.Sh DESCRIPTION 34The basic elements of some written natural languages such as Chinese 35cannot be represented uniquely with single C 36.Va char Ns s . 37The C standard supports two different ways of dealing with 38extended natural language encodings, 39.Em wide 40characters and 41.Em multibyte 42characters. 43Wide characters are an internal representation 44which allows each basic element to map 45to a single object of type 46.Va wchar_t . 47Multibyte characters are used for input and output 48and code each basic element as a sequence of C 49.Va char Ns s . 50Individual basic elements may map into one or more 51.Pq up to Dv MB_CHAR_MAX 52bytes in a multibyte character. 53.Pp 54The current locale 55.Pq Xr setlocale 3 56governs the interpretation of wide and multibyte characters. 57The locale category 58.Dv LC_CTYPE 59specifically controls this interpretation. 60The 61.Va wchar_t 62type is wide enough to hold the largest value 63in the wide character representations for all locales. 64.Pp 65Multibyte strings may contain 66.Sq shift 67indicators to switch to and from 68particular modes within the given representation. 69If explicit bytes are used to signal shifting, 70these are not recognized as separate characters 71but are lumped with a neighboring character. 72There is always a distinguished 73.Sq initial 74shift state. 75The 76.Fn mbstowcs 77and 78.Fn wcstombs 79functions assume that multibyte strings are interpreted 80starting from the initial shift state. 81The 82.Fn mblen , 83.Fn mbtowc 84and 85.Fn wctomb 86functions maintain static shift state internally. 87A call with a null 88.Fa mbchar 89pointer returns nonzero if the current locale requires shift states, 90zero otherwise; 91if shift states are required, the shift state is reset to the initial state. 92The internal shift states are undefined after a call to 93.Fn setlocale 94with the 95.Dv LC_CTYPE 96or 97.Dv LC_ALL 98categories. 99.Pp 100For convenience in processing, 101the wide character with value 0 102.Pq the null wide character 103is recognized as the wide character string terminator, 104and the character with value 0 105.Pq the null byte 106is recognized as the multibyte character string terminator. 107Null bytes are not permitted within multibyte characters. 108.Pp 109The 110.Fn mblen 111function computes the length in bytes 112of a multibyte character 113.Fa mbchar . 114Up to 115.Fa nbytes 116bytes are examined. 117.Pp 118The 119.Fn mbtowc 120function converts a multibyte character 121.Fa mbchar 122into a wide character and stores the result 123in the object pointed to by 124.Fa wcharp. 125Up to 126.Fa nbytes 127bytes are examined. 128.Pp 129The 130.Fn wctomb 131function converts a wide character 132.Fa wchar 133into a multibyte character and stores 134the result in 135.Fa mbchar . 136The object pointed to by 137.Fa mbchar 138must be large enough to accommodate the multibyte character. 139.Pp 140The 141.Fn mbstowcs 142function converts a multibyte character string 143.Fa mbstring 144into a wide character string 145.Fa wcstring . 146No more than 147.Fa nwchars 148wide characters are stored. 149A terminating null wide character is appended if there is room. 150.Pp 151The 152.Fn wcstombs 153function converts a wide character string 154.Fa wcstring 155into a multibyte character string 156.Fa mbstring . 157Up to 158.Fa nbytes 159bytes are stored in 160.Fa mbstring . 161Partial multibyte characters at the end of the string are not stored. 162The multibyte character string is null terminated if there is room. 163.Sh "RETURN VALUES 164If multibyte characters are not supported in the current locale, 165all of these functions will return \-1 if characters can be processed, 166otherwise 0. 167.Pp 168If 169.Fa mbchar 170is 171.Dv NULL , 172the 173.Fn mblen , 174.Fn mbtowc 175and 176.Fn wctomb 177functions return nonzero if shift states are supported, 178zero otherwise. 179If 180.Fa mbchar 181is valid, 182then these functions return 183the number of bytes processed in 184.Fa mbchar , 185or \-1 if no multibyte character 186could be recognized or converted. 187.Pp 188The 189.Fn mbstowcs 190function returns the number of wide characters converted, 191not counting any terminating null wide character. 192The 193.Fn wcstombs 194function returns the number of bytes converted, 195not counting any terminating null byte. 196If any invalid multibyte characters are encountered, 197both functions return \-1. 198.Sh "SEE ALSO 199.Xr setlocale 3 200.Sh STANDARDS 201The 202.Fn mblen , 203.Fn mbstowcs , 204.Fn mbtowc , 205.Fn wcstombs 206and 207.Fn wctomb 208functions conform to 209.St -ansiC . 210.Sh HISTORY 211The 212.Fn mblen , 213.Fn mbstowcs , 214.Fn mbtowc , 215.Fn wcstombs 216and 217.Fn wctomb 218functions are 219.Ud 220.Sh BUGS 221The current implementation supports only the 222.Li "\&""C"" 223locale. 224No multibyte or wide character encodings are recognized. 225