xref: /original-bsd/lib/libc/locale/utf2.4 (revision c3e32dec)
1.\" Copyright (c) 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" This code is derived from software contributed to Berkeley by
5.\" Paul Borman at Krystal Technologies.
6.\"
7.\" %sccs.include.redist.roff%
8.\"
9.\"	@(#)utf2.4	8.1 (Berkeley) 06/04/93
10.\"
11.Dd ""
12.Dt UTF2 4
13.Os
14.Sh NAME
15.Nm UTF2
16.Nd "Universal character set Transformation Format encoding of runes
17.Sh SYNOPSIS
18\fBENCODING "UTF2"\fP
19.Sh DESCRIPTION
20The
21.Nm UTF2
22encoding is based on a proposed X-Open multibyte
23\s-1FSS-UCS-TF\s+1 (File System Safe Universal Character Set Transformation Format) encoding as used in
24.Nm Plan 9 from Bell Labs.
25Although it is capable of representing more than 16 bits,
26the current implementation is limited to 16 bits as defined by the
27Unicode Standard.
28.Pp
29.Nm UTF2
30representation is backwards compatible with ASCII, so 0x00-0x7f refer to the
31ASCII character set.  The multibyte encoding of runes between 0x0080 and 0xffff
32consist entirely of bytes whose high order bit is set.  The actual
33encoding is represented by the following table:
34.Bd -literal
35[0x0000 - 0x007f] [00000000.0bbbbbbb] -> 0bbbbbbb
36[0x0080 - 0x03ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
37[0x0400 - 0xffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb
38.Ed
39.sp
40If more than a single representation of a value exists (for example,
410x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always
42used (but the longer ones will be correctly decoded).
43.Pp
44The final three encodings provided by X-Open:
45.Bd -literal
46[00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
47	11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
48
49[000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
50	111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
51
52[0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
53	1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
54.Ed
55.sp
56which provides for the entire proposed ISO-10646 31 bit standard are currently
57not implemented.
58.Sh "SEE ALSO"
59.Xr mklocale 1 ,
60.Xr setlocale 3
61