1.\" $OpenBSD: mbtowc.3,v 1.8 2023/11/11 01:38:23 schwarze Exp $ 2.\" $NetBSD: mbtowc.3,v 1.5 2003/09/08 17:54:31 wiz Exp $ 3.\" 4.\" Copyright (c) 2016, 2023 Ingo Schwarze <schwarze@openbsd.org> 5.\" Copyright (c) 2010, 2015 Stefan Sperling <stsp@openbsd.org> 6.\" Copyright (c) 2002 Citrus Project, 7.\" All rights reserved. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 19.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 20.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 21.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 22.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 26.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 27.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 28.\" SUCH DAMAGE. 29.\" 30.Dd $Mdocdate: November 11 2023 $ 31.Dt MBTOWC 3 32.Os 33.Sh NAME 34.Nm mbtowc 35.Nd converts a multibyte character to a wide character 36.Sh SYNOPSIS 37.In stdlib.h 38.Ft int 39.Fn mbtowc "wchar_t * restrict pwc" "const char * restrict s" "size_t n" 40.Sh DESCRIPTION 41The 42.Fn mbtowc 43function converts the multibyte character pointed to by 44.Fa s 45to a wide character, and stores it in the wchar_t object pointed to by 46.Fa pwc . 47This function may inspect at most 48.Fa n 49bytes of the array pointed to by 50.Fa s . 51.Pp 52Unlike 53.Xr mbrtowc 3 , 54the first 55.Fa n 56bytes pointed to by 57.Fa s 58need to form an entire multibyte character. 59Otherwise, this function returns an error and the internal state will 60be undefined. 61.Pp 62If a call to 63.Fn mbtowc 64results in an undefined internal state, parsing of the string starting at 65.Fa s 66cannot continue, not even at a later byte, and 67.Fn mbtowc 68must be called with 69.Ar s 70set to 71.Dv NULL 72to reset the internal state before it can safely be used again 73on a different string. 74.Pp 75The behaviour of 76.Fn mbtowc 77is affected by the 78.Dv LC_CTYPE 79category of the current locale. 80Calling any other functions in 81.Em libc 82never changes the internal 83state of 84.Fn mbtowc , 85except for calling 86.Xr setlocale 3 87with the 88.Dv LC_CTYPE 89category set to a different locale. 90Such 91.Xr setlocale 3 92calls cause the internal state of this function to be undefined. 93.Pp 94In state-dependent encodings such as ISO/IEC 2022-JP, 95.Fa s 96may point to the special sequence of bytes to change the shift-state. 97Because such sequence bytes do not correspond to any individual wide character, 98.Fn mbtowc 99treats them as if they were part of the subsequent multibyte character. 100.Pp 101The following special cases apply to the arguments: 102.Bl -tag -width 012345678901 103.It s == NULL 104.Fn mbtowc 105initializes its own internal state to the initial state, and 106determines whether the current encoding is state-dependent. 107.Fn mbtowc 108returns 0 if the encoding is state-independent, 109otherwise non-zero. 110.Fa pwc 111is ignored. 112.It pwc == NULL 113.Fn mbtowc 114behaves just as if 115.Fa pwc 116was not 117.Dv NULL , 118including modifications to internal state, 119except that the result of the conversion is discarded. 120This can be used to determine the size of the wide character 121representation of a multibyte string. 122Another use case is a check for illegal or incomplete multibyte sequences. 123.It n == 0 124In this case, 125the first 126.Fa n 127bytes of the array pointed to by 128.Fa s 129never form a complete character and 130.Fn mbtowc 131always fails. 132.El 133.Sh RETURN VALUES 134Normally, 135.Fn mbtowc 136returns: 137.Bl -tag -width 012345678901 138.It 0 139.Fa s 140points to a null byte 141.Pq Sq \e0 . 142.It positive 143Number of bytes for the valid multibyte character pointed to by 144.Fa s . 145There are no cases where the value returned is greater than 146the value of the 147.Dv MB_CUR_MAX 148macro. 149.It -1 150.Fa s 151points to an invalid or an incomplete multibyte character. 152.Va errno 153is set to indicate the error. 154.El 155.Pp 156When 157.Fa s 158is 159.Dv NULL , 160.Fn mbtowc 161returns: 162.Bl -tag -width 0123456789 163.It 0 164The current encoding is state-independent. 165.It non-zero 166The current encoding is state-dependent. 167.El 168.Sh EXAMPLES 169The following program parses a UTF-8 string and reports encoding errors: 170.Bd -literal 171#include <limits.h> 172#include <locale.h> 173#include <stdio.h> 174#include <stdlib.h> 175 176int 177main(void) 178{ 179 char s[LINE_MAX]; 180 wchar_t wc; 181 int i, len; 182 183 setlocale(LC_CTYPE, "C.UTF-8"); 184 if (fgets(s, sizeof(s), stdin) == NULL) 185 *s = '\e0'; 186 for (i = 0, len = 1; len != 0; i += len) { 187 switch (len = mbtowc(&wc, s + i, MB_CUR_MAX)) { 188 case 0: 189 printf("byte %d end of string 0x00\en", i); 190 break; 191 case -1: 192 printf("byte %d invalid 0x%0.2hhx\en", i, s[i]); 193 len = 1; 194 break; 195 default: 196 printf("byte %d U+%0.4X %lc\en", i, wc, wc); 197 break; 198 } 199 } 200 return 0; 201} 202.Ed 203.Pp 204Recovering from encoding errors and continuing to parse the rest of the 205string as shown above is only possible for state-independent character 206encodings. 207For full generality, the error handling can be modified 208to reset the internal state. 209In that case, the rest of the string has to be skipped 210if the encoding is state-dependent: 211.Bd -literal 212 case -1: 213 printf("byte %d invalid 0x%0.2hhx\en", i, s[i]); 214 len = !mbtowc(NULL, NULL, MB_CUR_MAX); 215 break; 216.Ed 217.Sh ERRORS 218.Fn mbtowc 219will set 220.Va errno 221in the following cases: 222.Bl -tag -width Er 223.It Bq Er EILSEQ 224.Fa s 225points to an invalid or incomplete multibyte character. 226.El 227.Sh SEE ALSO 228.Xr mblen 3 , 229.Xr mbrtowc 3 , 230.Xr setlocale 3 231.Sh STANDARDS 232The 233.Fn mbtowc 234function conforms to 235.St -ansiC . 236The restrict qualifier is added at 237.St -isoC-99 . 238Setting 239.Va errno 240is an 241.St -p1003.1-2008 242extension. 243.Sh CAVEATS 244On error, callers of 245.Fn mbtowc 246cannot tell whether the multibyte character was invalid or incomplete. 247To treat incomplete data differently from invalid data the 248.Xr mbrtowc 3 249function can be used instead. 250