1.\" $OpenBSD: mbrtowc.3,v 1.7 2023/09/12 08:33:37 jsg Exp $ 2.\" $NetBSD: mbrtowc.3,v 1.5 2003/09/08 17:54:31 wiz Exp $ 3.\" 4.\" Copyright (c)2023 Ingo Schwarze <schwarze@openbsd.org> 5.\" Copyright (c)2010 Stefan Sperling <stsp@openbsd.org> 6.\" Copyright (c)2002 Citrus Project, 7.\" All rights reserved. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 19.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 20.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 21.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 22.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 26.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 27.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 28.\" SUCH DAMAGE. 29.\" 30.Dd $Mdocdate: September 12 2023 $ 31.Dt MBRTOWC 3 32.Os 33.Sh NAME 34.Nm mbrtowc , 35.Nm mbrtoc32 36.Nd convert a multibyte character to a wide character (restartable) 37.Sh SYNOPSIS 38.In wchar.h 39.Ft size_t 40.Fo mbrtowc 41.Fa "wchar_t * restrict wc" 42.Fa "const char * restrict s" 43.Fa "size_t n" 44.Fa "mbstate_t * restrict mbs" 45.Fc 46.In uchar.h 47.Ft size_t 48.Fo mbrtoc32 49.Fa "char32_t * restrict wc" 50.Fa "const char * restrict s" 51.Fa "size_t n" 52.Fa "mbstate_t * restrict mbs" 53.Fc 54.Sh DESCRIPTION 55The 56.Fn mbrtowc 57and 58.Fn mbrtoc32 59functions examine at most 60.Fa n 61bytes of the multibyte character byte string pointed to by 62.Fa s , 63convert those bytes to a wide character, and store the wide character into 64.Pf * Fa wc 65if 66.Fa wc 67is not 68.Dv NULL 69and 70.Fa s 71points to a valid character. 72.Pp 73Conversion happens in accordance with the conversion state 74.Pf * Fa mbs , 75which must be initialized to zero before the application's first call to 76.Fn mbrtowc 77or 78.Fn mbrtoc32 . 79If the previous call did not return 80.Po Vt size_t Pc Ns \-1 , 81.Fa mbs 82can safely be reused without reinitialization. 83.Pp 84The input encoding that 85.Fn mbrtowc 86and 87.Fn mbrtoc32 88use for 89.Fa s 90is determined by the 91.Dv LC_CTYPE 92category of the current locale. 93If the locale is changed without reinitialization of 94.Pf * Fa mbs , 95the behaviour is undefined. 96.Pp 97Unlike 98.Xr mbtowc 3 , 99.Fn mbrtowc 100and 101.Fn mbrtoc32 102accept an incomplete byte sequence pointed to by 103.Fa s 104which does not form a complete character but is potentially part of 105a valid character. 106In this case, both functions consume all such bytes. 107The conversion state saved in 108.Pf * Fa mbs 109will be used to restart the suspended conversion during the next call. 110.Pp 111On systems other than 112.Ox 113that support state-dependent encodings, 114.Fa s 115may point to a special sequence of bytes called a 116.Dq shift sequence . 117Shift sequences switch between character code sets available within an 118encoding scheme. 119One encoding scheme using shift sequences is ISO/IEC 2022-JP, which 120can switch e.g. from ASCII (which uses one byte per character) to 121JIS X 0208 (which uses two bytes per character). 122Shift sequence bytes correspond to no individual wide character, so 123.Fn mbrtowc 124and 125.Fn mbrtoc32 126treat them as if they were part of the subsequent multibyte character. 127Therefore they do contribute to the number of bytes in the multibyte character. 128.Pp 129The following arguments cause special processing: 130.Bl -tag -width 012345678901 131.It Fa wc No == Dv NULL 132The conversion from a multibyte character to a wide character is performed 133and the conversion state may be affected, but the resulting wide character 134is discarded. 135This can be used to find out how many bytes are contained in the 136multibyte character pointed to by 137.Fa s . 138.It Fa s No == Dv NULL 139The arguments 140.Fa wc 141and 142.Fa n 143are ignored and starting or continuing the conversion with an empty string 144is attempted, discarding the conversion result. 145If conversion succeeds, this call always returns zero. 146Unlike 147.Xr mbtowc 3 , 148the value returned does not indicate whether the current encoding of 149the locale is state-dependent, i.e. uses shift sequences. 150.It Fa mbs No == Dv NULL 151.Fn mbrtowc 152and 153.Fn mbrtoc32 154each use their own internal state object instead of the 155.Fa mbs 156argument. 157Both internal state objects are initialized at startup time of the program, 158and no other libc function ever changes either of them. 159.Pp 160If 161.Fn mbrtowc 162or 163.Fn mbrtoc32 164is called with a 165.Dv NULL 166.Fa mbs 167argument and that call returns 168.Po Vt size_t Pc Ns \-1 , 169the internal conversion state of the respective function becomes 170permanently undefined and there is no way to reset it to any defined state. 171Consequently, after such a mishap, it is not safe 172to call the same function with a 173.Dv NULL 174.Fa mbs 175argument ever again until the program is terminated. 176.El 177.Sh RETURN VALUES 178.Bl -tag -width 012345678901 179.It 0 180The bytes pointed to by 181.Fa s 182form a terminating NUL character. 183If 184.Fa wc 185is not 186.Dv NULL , 187a NUL wide character has been stored in the wchar_t object pointed to by 188.Fa wc . 189.It positive 190.Fa s 191points to a valid character, and the value returned is the number of 192bytes completing the character. 193If 194.Fa wc 195is not 196.Dv NULL , 197the corresponding wide character has been stored in the wchar_t object 198pointed to by 199.Fa wc . 200.It Po Vt size_t Pc Ns \-1 201.Fa s 202points to an illegal byte sequence which does not form a valid multibyte 203character in the current locale, or 204.Fa mbs 205points to an invalid or uninitialized object. 206.Va errno 207is set to 208.Er EILSEQ 209or 210.Er EINVAL , 211respectively. 212The conversion state object pointed to by 213.Fa mbs 214is left in an undefined state and must be reinitialized before being 215used again. 216.Pp 217Because applications using 218.Fn mbrtowc 219or 220.Fn mbrtoc32 221are shielded from the specifics of the multibyte character encoding scheme, 222it is impossible to repair byte sequences containing encoding errors. 223Such byte sequences must be treated as invalid and potentially malicious input. 224Applications must stop processing the byte string pointed to by 225.Fa s 226and either discard any wide characters already converted, or cope with 227truncated input. 228.It Po Vt size_t Pc Ns \-2 229.Fa s 230points to an incomplete byte sequence of length 231.Fa n 232which has been consumed and contains part of a valid multibyte character. 233The character may be completed by calling the same function again with 234.Fa s 235pointing to one or more subsequent bytes of the multibyte character and 236.Fa mbs 237pointing to the conversion state object used during conversion of the 238incomplete byte sequence. 239.It Po Vt size_t Pc Ns \-3 240The next character resulting from a previous call has been stored into 241.Fa wc , 242without consuming any additional bytes from 243.Fa s . 244This never happens for 245.Fn mbrtowc , 246and on 247.Ox , 248it never happens for 249.Fn mbrtoc32 250either. 251.El 252.Sh ERRORS 253.Fn mbrtowc 254and 255.Fn mbrtoc32 256cause an error in the following cases: 257.Bl -tag -width Er 258.It Bq Er EILSEQ 259.Fa s 260points to an invalid multibyte character. 261.It Bq Er EINVAL 262.Fa mbs 263points to an invalid or uninitialized 264.Vt mbstate_t 265object. 266.El 267.Sh SEE ALSO 268.Xr mbrlen 3 , 269.Xr mbtowc 3 , 270.Xr setlocale 3 , 271.Xr wcrtomb 3 272.Sh STANDARDS 273.Fn mbrtowc 274conforms to 275.St -isoC-amd1 . 276The restrict qualifier was added at 277.St -isoC-99 . 278.Pp 279.Fn mbrtoc32 280conforms to 281.St -isoC-2011 . 282.Sh HISTORY 283.Fn mbrtowc 284has been available since 285.Ox 3.8 286and has provided support for UTF-8 since 287.Ox 4.8 . 288.Pp 289.Fn mbrtoc32 290has been available since 291.Ox 7.4 . 292.Sh CAVEATS 293.Fn mbrtowc 294and 295.Fn mbrtoc32 296are not suitable for programs that care about internals of the character 297encoding scheme used by the byte string pointed to by 298.Fa s . 299.Pp 300It is possible that these functions 301fail because of locale configuration errors. 302An 303.Dq invalid 304character sequence may simply be encoded in a different encoding than that 305of the current locale. 306.Pp 307The special cases for 308.Fa s No == Dv NULL 309and 310.Fa mbs No == Dv NULL 311do not make any sense. 312Instead of passing 313.Dv NULL 314for 315.Fa mbs , 316.Xr mbtowc 3 317can be used. 318.Pp 319Earlier versions of this man page implied that calling 320.Fn mbrtowc 321with a 322.Dv NULL 323.Fa s 324argument would always set 325.Fa mbs 326to the initial conversion state. 327But this is true only if the previous call to 328.Fn mbrtowc 329using 330.Fa mbs 331did not return (size_t)-1 or (size_t)-2. 332It is recommended to zero the mbstate_t object instead. 333