1.\" $OpenBSD: mbrtowc.3,v 1.3 2010/12/05 14:59:49 stsp Exp $ 2.\" $NetBSD: mbrtowc.3,v 1.5 2003/09/08 17:54:31 wiz Exp $ 3.\" 4.\" Copyright (c)2002 Citrus Project, 5.\" All rights reserved. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 19.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 26.\" SUCH DAMAGE. 27.\" 28.Dd $Mdocdate: December 5 2010 $ 29.Dt MBRTOWC 3 30.Os 31.Sh NAME 32.Nm mbrtowc 33.Nd converts a multibyte character to a wide character (restartable) 34.Sh SYNOPSIS 35.Fd #include <wchar.h> 36.Ft size_t 37.Fn mbrtowc "wchar_t * restrict wc" "const char * restrict s" "size_t n" \ 38"mbstate_t * restrict mbs" 39.Sh DESCRIPTION 40The 41.Fn mbrtowc 42function examines at most 43.Fa n 44bytes of the multibyte character byte string pointed to by 45.Fa s , 46converts those bytes to a wide character, and stores the wide character 47in the wchar_t object pointed to by 48.Fa wc 49if 50.Fa wc 51is not 52.Dv NULL 53and 54.Fa s 55points to a valid character. 56.Pp 57Conversion happens in accordance with the conversion state described 58by the mbstate_t object pointed to by 59.Fa mbs . 60The mbstate_t object must be initialized to zero before the application's 61first call to 62.Fn mbrtowc . 63If the previous call to 64.Fn mbrtowc 65did not return (size_t)-1, the mbstate_t object can safely be reused 66without reinitialization. 67.Pp 68The behaviour of 69.Fn mbrtowc 70is affected by the 71.Dv LC_CTYPE 72category of the current locale. 73If the locale is changed without reinitialization of the mbstate_t object 74pointed to by 75.Fa mbs , 76the behaviour of 77.Fn mbrtowc 78is undefined. 79.Pp 80Unlike 81.Xr mbtowc 3 , 82.Fn mbrtowc 83will accept an incomplete byte sequence pointed to by 84.Fa s 85which does not form a complete character but is potentially part of 86a valid character. 87In this case, 88.Fn mbrtowc 89consumes all such bytes. 90The conversion state saved in the mbstate_t object pointed to by 91.Fa mbs 92will be used to restart the suspended conversion during the next 93call to 94.Fn mbrtowc . 95.Pp 96In state-dependent encodings, 97.Fa s 98may point to a special sequence of bytes called a 99.Dq shift sequence . 100Shift sequences switch between character code sets available within an 101encoding scheme. 102One encoding scheme using shift sequences is ISO/IEC 2022-JP, which 103can switch e.g. from ASCII (which uses one byte per character) to 104JIS X 0208 (which uses two bytes per character). 105Shift sequence bytes correspond to no individual wide character, so 106.Fn mbrtowc 107treats them as if they were part of the subsequent multibyte character. 108Therefore they do contribute to the number of bytes in the multibyte character. 109.Pp 110Special cases in interpretation of arguments are as follows: 111.Bl -tag -width 012345678901 112.It "wc == NULL " 113The conversion from a multibyte character to a wide character is performed 114and the conversion state may be affected, but the resulting wide character 115is discarded. 116.Pp 117This can be used to find out how many bytes are contained in the 118multibyte character pointed to by 119.Fa s . 120.It "s == NULL " 121.Fn mbrtowc 122ignores 123.Fa wc 124and 125.Fa n , 126and behaves equivalent to 127.Bd -literal -offset indent 128mbrtowc(NULL, "", 1, mbs); 129.Ed 130.Pp 131which attempts to use the mbstate_t object pointed to by 132.Fa mbs 133to start or continue conversion using the empty string as input, 134and discards the conversion result. 135.Pp 136If conversion succeeds, this call always returns zero. 137Unlike 138.Xr mbtowc 3 , 139the value returned does not indicate whether the current encoding of 140the locale is state-dependent, i.e. uses shift sequences. 141.It "mbs == NULL " 142.Fn mbrtowc 143uses its own internal state object to keep the conversion state, 144instead of an mbstate_t object pointed to by 145.Fa mbs . 146This internal conversion state is initialized once at program startup. 147It is not safe to call 148.Fn mbrtowc 149again with a 150.Dv NULL 151.Fa mbs 152argument if 153.Fn mbrtowc 154returned (size_t)-1 because at this point the internal conversion state 155is undefined. 156.Pp 157Calling any other functions in 158.Em libc 159never changes the internal 160conversion state object of 161.Fn mbrtowc . 162.El 163.Sh RETURN VALUES 164.Bl -tag -width 012345678901 165.It 0 166The bytes pointed to by 167.Fa s 168form a terminating NUL character. 169If 170.Fa wc 171is not 172.Dv NULL , 173a NUL wide character has been stored in the wchar_t object pointed to by 174.Fa wc . 175.It positive 176.Fa s 177points to a valid character, and the value returned is the number of 178bytes completing the character. 179If 180.Fa wc 181is not 182.Dv NULL , 183the corresponding wide character has been stored in the wchar_t object 184pointed to by 185.Fa wc . 186.It (size_t)-1 187.Fa s 188points to an illegal byte sequence which does not form a valid multibyte 189character in the current locale. 190.Fn mbrtowc 191sets 192.Va errno 193to EILSEQ. 194The conversion state object pointed to by 195.Fa mbs 196is left in an undefined state and must be reinitialized before being 197used again. 198.Pp 199Because applications using 200.Fn mbrtowc 201are shielded from the specifics of the multibyte character encoding scheme, 202it is impossible to repair byte sequences containing encoding errors. 203Such byte sequences must be treated as invalid and potentially malicious input. 204Applications must stop processing the byte string pointed to by 205.Fa s 206and either discard any wide characters already converted, or cope with 207truncated input. 208.It (size_t)-2 209.Fa s 210points to an incomplete byte sequence of length 211.Fa n 212which has been consumed and contains part of a valid multibyte character. 213.Fn mbrtowc 214sets 215.Va errno 216to EILSEQ. 217The character may be completed by calling 218.Fn mbrtowc 219again with 220.Fa s 221pointing to one or more subsequent bytes of the multibyte character and 222.Fa mbs 223pointing to the conversion state object used during conversion of the 224incomplete byte sequence. 225.El 226.Sh ERRORS 227The 228.Fn mbrtowc 229function may cause an error in the following cases: 230.Bl -tag -width Er 231.It Bq Er EILSEQ 232.Fa s 233points to an invalid or incomplete multibyte character. 234.It Bq Er EINVAL 235.Fa mbs 236points to an invalid or uninitialized mbstate_t object. 237.El 238.Sh SEE ALSO 239.Xr mbrlen 3 , 240.Xr mbtowc 3 , 241.Xr setlocale 3 242.Sh STANDARDS 243The 244.Fn mbrtowc 245function conforms to 246.\" .St -isoC-amd1 . 247ISO/IEC 9899/AMD1:1995 248.Pq Dq ISO C90, Amendment 1 . 249The restrict qualifier is added at 250.\" .St -isoC99 . 251ISO/IEC 9899:1999 252.Pq Dq ISO C99 . 253.Sh CAVEATS 254.Fn mbrtowc 255is not suitable for programs that care about internals of the character 256encoding scheme used by the byte string pointed to by 257.Fa s . 258.Pp 259It is possible that 260.Fn mbrtowc 261fails because of locale configuration errors. 262An 263.Dq invalid 264character sequence may simply be encoded in a different encoding than that 265of the current locale. 266.Pp 267The special cases for 268.Fa s 269== NULL and 270.Fa mbs 271== NULL do not make any sense. 272Instead of passing 273.Dv NULL 274for 275.Fa mbs , 276.Xr mbtowc 3 277can be used. 278.Pp 279Earlier versions of this man page implied that calling 280.Fn mbrtowc 281with a 282.Dv NULL 283.Fa s 284argument would always set 285.Fa mbs 286to the initial conversion state. 287But this is true only if the previous call to 288.Fn mbrtowc 289using 290.Fa mbs 291did not return (size_t)-1 or (size_t)-2. 292It is recommended to zero the mbstate_t object instead. 293