xref: /dragonfly/share/man/man7/nls.7 (revision e65bc1c3)
1.\"     $NetBSD: nls.7,v 1.11 2003/06/26 11:55:56 wiz Exp $
2.\"
3.\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Gregory McGarry.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"        This product includes software developed by the NetBSD
20.\"        Foundation, Inc. and its contributors.
21.\" 4. Neither the name of The NetBSD Foundation nor the names of its
22.\"    contributors may be used to endorse or promote products derived
23.\"    from this software without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35.\" POSSIBILITY OF SUCH DAMAGE.
36.\"
37.Dd May 17, 2003
38.Dt NLS 7
39.Os
40.Sh NAME
41.Nm NLS
42.Nd Native Language Support Overview
43.Sh DESCRIPTION
44Native Language Support (NLS) provides commands for a single
45worldwide operating system base.
46An internationalized system has no built-in assumptions or dependencies
47on language-specific or cultural-specific conventions such as:
48.Pp
49.Bl -bullet -offset indent -compact
50.It
51Character classifications
52.It
53Character comparison rules
54.It
55Character collation order
56.It
57Numeric and monetary formatting
58.It
59Date and time formatting
60.It
61Message-text language
62.It
63Character sets
64.El
65.Pp
66All information pertaining to cultural conventions and language is
67obtained at program run time.
68.Pp
69.Dq Internationalization
70(often abbreviated
71.Dq i18n )
72refers to the operation by which system software is developed to support
73multiple cultural-specific and language-specific conventions.
74This is a generalization process by which the system is untied from
75calling only English strings or other English-specific conventions.
76.Dq Localization
77(often abbreviated
78.Dq l10n )
79refers to the operations by which the user environment is customized to
80handle its input and output appropriate for specific language and cultural
81conventions.
82This is a specialization process, by which generic methods already
83implemented in an internationalized system are used in specific ways.
84The formal description of cultural conventions for some country, together
85with all associated translations targeted to the native language, is
86called the
87.Dq locale .
88.Pp
89.Dx
90provides extensive support to programmers and system developers to
91enable internationalized software to be developed.
92.Dx
93also supplies a large variety of locales for system localization.
94.Ss Localization of Information
95All locale information is accessible to programs at run time so that
96data is processed and displayed correctly for specific cultural
97conventions and language.
98.Pp
99A locale is divided into categories.
100A category is a group of language-specific and culture-specific conventions
101as outlined in the list above.
102ISO C specifies the following six standard categories supported by
103.Dx :
104.Pp
105.Bl -tag -compact -width LC_MONETARYXX
106.It LC_COLLATE
107string-collation order information
108.It LC_CTYPE
109character classification, case conversion, and other character attributes
110.It LC_MESSAGES
111the format for affirmative and negative responses
112.It LC_MONETARY
113rules and symbols for formatting monetary numeric information
114.It LC_NUMERIC
115rules and symbols for formatting nonmonetary numeric information
116.It LC_TIME
117rules and symbols for formatting time and date information
118.El
119.Pp
120Localization of the system is achieved by setting appropriate values
121in environment variables to identify which locale should be used.
122The environment variables have the same names as their respective
123locale categories.
124Additionally, the
125.Ev LANG ,
126.Ev LC_ALL ,
127and
128.Ev NLSPATH
129environment variables are used.
130The
131.Ev NLSPATH
132environment variable specifies a colon-separated list of directory names
133where the message catalog files of the NLS database are located.
134The
135.Ev LC_ALL
136and
137.Ev LANG
138environment variables also determine the current locale.
139.Pp
140The values of these environment variables contains a string format as:
141.Bd -literal
142	language[_territory][.codeset][@modifier]
143.Ed
144.Pp
145Valid values for the language field come from the ISO639 standard which
146defines two-character codes for many languages.
147Some common language codes are:
148.Pp
149.nf
150.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
151\fILanguage Name\fP	\fICode\fP	\fILanguage Family\fP
152.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
153.sp 5p
154ABKHAZIAN	AB	IBERO-CAUCASIAN
155AFAN (OROMO)	OM	HAMITIC
156AFAR	AA	HAMITIC
157AFRIKAANS	AF	GERMANIC
158ALBANIAN	SQ	INDO-EUROPEAN (OTHER)
159AMHARIC	AM	SEMITIC
160ARABIC	AR	SEMITIC
161ARMENIAN	HY	INDO-EUROPEAN (OTHER)
162ASSAMESE	AS	INDIAN
163AYMARA	AY	AMERINDIAN
164AZERBAIJANI	AZ	TURKIC/ALTAIC
165BASHKIR	BA	TURKIC/ALTAIC
166BASQUE	EU	BASQUE
167BENGALI	BN	INDIAN
168BHUTANI	DZ	ASIAN
169BIHARI	BH	INDIAN
170BISLAMA	BI
171BRETON	BR	CELTIC
172BULGARIAN	BG	SLAVIC
173BURMESE	MY	ASIAN
174BYELORUSSIAN	BE	SLAVIC
175CAMBODIAN	KM	ASIAN
176CATALAN	CA	ROMANCE
177CHINESE	ZH	ASIAN
178CORSICAN	CO	ROMANCE
179CROATIAN	HR	SLAVIC
180CZECH	CS	SLAVIC
181DANISH	DA	GERMANIC
182DUTCH	NL	GERMANIC
183ENGLISH	EN	GERMANIC
184ESPERANTO	EO	INTERNATIONAL AUX.
185ESTONIAN	ET	FINNO-UGRIC
186FAROESE	FO	GERMANIC
187FIJI	FJ	OCEANIC/INDONESIAN
188FINNISH	FI	FINNO-UGRIC
189FRENCH	FR	ROMANCE
190FRISIAN	FY	GERMANIC
191GALICIAN	GL	ROMANCE
192GEORGIAN	KA	IBERO-CAUCASIAN
193GERMAN	DE	GERMANIC
194GREEK	EL	LATIN/GREEK
195GREENLANDIC	KL	ESKIMO
196GUARANI	GN	AMERINDIAN
197GUJARATI	GU	INDIAN
198HAUSA	HA	NEGRO-AFRICAN
199HEBREW	HE	SEMITIC
200HINDI	HI	INDIAN
201HUNGARIAN	HU	FINNO-UGRIC
202ICELANDIC	IS	GERMANIC
203INDONESIAN	ID	OCEANIC/INDONESIAN
204INTERLINGUA	IA	INTERNATIONAL AUX.
205INTERLINGUE	IE	INTERNATIONAL AUX.
206INUKTITUT	IU
207INUPIAK	IK	ESKIMO
208IRISH	GA	CELTIC
209ITALIAN	IT	ROMANCE
210JAPANESE	JA	ASIAN
211JAVANESE	JV	OCEANIC/INDONESIAN
212KANNADA	KN	DRAVIDIAN
213KASHMIRI	KS	INDIAN
214KAZAKH	KK	TURKIC/ALTAIC
215KINYARWANDA	RW	NEGRO-AFRICAN
216KIRGHIZ	KY	TURKIC/ALTAIC
217KURUNDI	RN	NEGRO-AFRICAN
218KOREAN	KO	ASIAN
219KURDISH	KU	IRANIAN
220LAOTHIAN	LO	ASIAN
221LATIN	LA	LATIN/GREEK
222LATVIAN	LV	BALTIC
223LINGALA	LN	NEGRO-AFRICAN
224LITHUANIAN	LT	BALTIC
225MACEDONIAN	MK	SLAVIC
226MALAGASY	MG	OCEANIC/INDONESIAN
227MALAY	MS	OCEANIC/INDONESIAN
228MALAYALAM	ML	DRAVIDIAN
229MALTESE	MT	SEMITIC
230MAORI	MI	OCEANIC/INDONESIAN
231MARATHI	MR	INDIAN
232MOLDAVIAN	MO	ROMANCE
233MONGOLIAN	MN
234NAURU	NA
235NEPALI	NE	INDIAN
236NORWEGIAN	NO	GERMANIC
237OCCITAN	OC	ROMANCE
238ORIYA	OR	INDIAN
239PASHTO	PS	IRANIAN
240PERSIAN (farsi)	FA	IRANIAN
241POLISH	PL	SLAVIC
242PORTUGUESE	PT	ROMANCE
243PUNJABI	PA	INDIAN
244QUECHUA	QU	AMERINDIAN
245RHAETO-ROMANCE  RM	ROMANCE
246ROMANIAN	RO	ROMANCE
247RUSSIAN	RU	SLAVIC
248SAMOAN	SM	OCEANIC/INDONESIAN
249SANGHO	SG	NEGRO-AFRICAN
250SANSKRIT	SA	INDIAN
251SCOTS GAELIC	GD	CELTIC
252SERBIAN	SR	SLAVIC
253SERBO-CROATIAN  SH	SLAVIC
254SESOTHO	ST	NEGRO-AFRICAN
255SETSWANA	TN	NEGRO-AFRICAN
256SHONA	SN	NEGRO-AFRICAN
257SINDHI	SD	INDIAN
258SINGHALESE	SI	INDIAN
259SISWATI	SS	NEGRO-AFRICAN
260SLOVAK	SK	SLAVIC
261SLOVENIAN	SL	SLAVIC
262SOMALI	SO	HAMITIC
263SPANISH	ES	ROMANCE
264SUNDANESE	SU	OCEANIC/INDONESIAN
265SWAHILI	SW	NEGRO-AFRICAN
266SWEDISH	SV	GERMANIC
267TAGALOG	TL	OCEANIC/INDONESIAN
268TAJIK	TG	IRANIAN
269TAMIL	TA	DRAVIDIAN
270TATAR	TT	TURKIC/ALTAIC
271TELUGU	TE	DRAVIDIAN
272THAI	TH	ASIAN
273TIBETAN	BO	ASIAN
274TIGRINYA	TI	SEMITIC
275TONGA	TO	OCEANIC/INDONESIAN
276TSONGA	TS	NEGRO-AFRICAN
277TURKISH	TR	TURKIC/ALTAIC
278TURKMEN	TK	TURKIC/ALTAIC
279TWI	TW	NEGRO-AFRICAN
280UIGUR	UG
281UKRAINIAN	UK	SLAVIC
282URDU	UR	INDIAN
283UZBEK	UZ	TURKIC/ALTAIC
284VIETNAMESE	VI	ASIAN
285VOLAPUK	VO	INTERNATIONAL AUX.
286WELSH	CY	CELTIC
287WOLOF	WO	NEGRO-AFRICAN
288XHOSA	XH	NEGRO-AFRICAN
289YIDDISH	YI	GERMANIC
290YORUBA	YO	NEGRO-AFRICAN
291ZHUANG	ZA
292ZULU	ZU	NEGRO-AFRICAN
293.ta
294.fi
295.Pp
296For example, the locale for the Danish language spoken in Denmark
297using the ISO8859-1 character set is da_DK.ISO8859-1.
298The da stands for the Danish language and the DK stands for Denmark.
299The short form of da_DK is sufficient to indicate this locale.
300.Pp
301The environment variable settings are queried by their priority level
302in the following manner:
303.Bl -bullet
304.It
305If the
306.Ev LC_ALL
307environment variable is set, all six categories use the locale it
308specifies.
309.It
310If the
311.Ev LC_ALL
312environment variable is not set, each individual category uses the
313locale specified by its corresponding environment variable.
314.It
315If the
316.Ev LC_ALL
317environment variable is not set, and a value for a particular
318.Ev LC_*
319environment variable is not set, the value of the
320.Ev LANG
321environment variable specifies the default locale for all categories.
322Only the
323.Ev LANG
324environment variable should be set in /etc/profile, since it makes it
325most easy for the user to override the system default using the individual
326.Ev LC_*
327variables.
328.It
329If the
330.Ev LC_ALL
331environment variable is not set, a value for a particular
332.Ev LC_*
333environment variable is not set, and the value of the
334.Ev LANG
335environment variable is not set, the locale for that specific
336category defaults to the C locale.
337The C or POSIX locale assumes the 7-bit ASCII character set and defines
338information for the six categories.
339.El
340.Ss Character Sets
341A character is any symbol used for the organization, control, or
342representation of data.
343A group of such symbols used to describe a
344particular language make up a character set.
345It is the encoding values in a character set that provide
346the interface between the system and its input and output devices.
347.Pp
348The following character sets are supported in
349.Dx
350.Bl -tag -width ISO8859_family
351.It ISO8859 family
352Industry-standard character sets are provided by means of the ISO8859
353family of character sets, which provide a range of single-byte character set
354support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew,
355Greek, and Turkish.
356The eucJP character set is the industry-standard character set used to support
357the Japanese locale.
358.It Unicode
359A Unicode environment based on the UTF-8 character set is supported for all
360supported language/territories.
361UTF-8 provides character support for most of the major languages of the
362world and can be used in environments where multiple languages must be
363processed simultaneously.
364.El
365.Ss Font Sets
366A font set contains the glyphs to be displayed on the screen for a
367corresponding character in a character set.
368A display must support a suitable font to display a character set.
369If suitable fonts are available to the X server, then X clients can
370include support for different character sets.
371.Xr xterm 1
372includes support for UTF-8 character sets.
373.Xr xfd 1
374is useful for displaying all the characters in an X font.
375.Pp
376The
377.Dx
378.Xr syscons 4
379console provides support for loading a variety of fonts using the
380.Xr vidcontrol 1
381utility. Available fonts can be found in
382.Pa /usr/share/syscons/fonts .
383.Ss Internationalization for Programmers
384To facilitate translations of messages into various languages and to
385make the translated messages available to the program based on a
386user's locale, it is necessary to keep messages separate from the
387programs and provide them in the form of message catalogs that a
388program can access at run time.
389.Pp
390Access to locale information is provided through the
391.Xr setlocale 3
392and
393.Xr nl_langinfo 3
394interfaces.
395See their respective man pages for further information.
396.Pp
397Message source files containing application messages are created by
398the programmer and converted to message catalogs.
399These catalogs are used by the application to retrieve and display
400messages, as needed.
401.Pp
402.Dx
403supports two message catalog interfaces: the X/Open
404.Xr catgets 3
405interface and the Uniforum
406.Xr gettext 3
407interface.
408The
409.Xr catgets 3
410interface has the advantage that it belongs to a standard which is
411well supported.
412Unfortunately the interface is complicated to use and
413maintenance of the catalogs is difficult.
414The implementation also doesn't support different character sets.
415The
416.Xr gettext 3
417interface has not been standardized yet, however it is being supported
418by an increasing number of systems.
419It also provides many additional tools which make programming and
420catalog maintenance much easier.
421.Ss Support for Multibyte Characters and Wide Characters
422Character sets with multibyte characters may be difficult to decode, or may
423contain state (i.e., adjacent characters are dependent).
424ISO C specifies a set of functions using 'wide characters' which can handle
425multibyte characters properly.
426A wide character is specified in ISO C
427as being a fixed number of bits wide and is stateless.
428.Pp
429There are two types for wide characters:
430.Em wchar_t
431and
432.Em wint_t .
433.Em wchar_t
434is a type which can contain one wide character and operates like 'char'
435type does for one character.
436.Em wint_t
437can contain one wide character or WEOF (wide EOF).
438.Pp
439There are functions that operate on
440.Em wchar_t ,
441and substitute for functions operating on 'char'.
442See
443.Xr wmemchr 3
444and
445.Xr towlower 3
446for details.
447There are some additional functions that operate on
448.Em wchar_t .
449See
450.Xr wctype 3
451and
452.Xr wctrans 3
453for details.
454.Pp
455Wide characters should be used for all I/O processing which may rely
456on locale-specific strings.
457The two primary issues requiring special use of wide characters are:
458.Bl -bullet -offset indent
459.It
460All I/O is performed using multibyte characters.
461Input data is converted into wide characters immediately after
462reading and data for output is converted from wide characters to
463multibyte characters immediately before writing.
464Conversion is achieved using
465.Xr mbstowcs 3 ,
466.Xr mbsrtowcs 3 ,
467.Xr wcstombs 3 ,
468.Xr wcsrtombs 3 ,
469.Xr mblen 3 ,
470.Xr mbrlen 3 ,
471and
472.Xr  mbsinit 3 .
473.It
474Wide characters are used directly for I/O, using
475.Xr getwchar 3 ,
476.Xr fgetwc 3 ,
477.Xr getwc 3 ,
478.Xr ungetwc 3 ,
479.Xr fgetws 3 ,
480.Xr putwchar 3 ,
481.Xr fputwc 3 ,
482.Xr putwc 3 ,
483and
484.Xr fputws 3 .
485They are also used for formatted I/O functions for wide characters
486such as
487.Xr fwscanf 3 ,
488.Xr wscanf 3 ,
489.Xr swscanf 3 ,
490.Xr fwprintf 3 ,
491.Xr wprintf 3 ,
492.Xr swprintf 3 ,
493.Xr vfwprintf 3 ,
494.Xr vwprintf 3 ,
495and
496.Xr vswprintf 3 ,
497and wide character identifier of %lc, %C, %ls, %S for conventional
498formatted I/O functions.
499.El
500.Sh SEE ALSO
501.Xr gencat 1 ,
502.Xr vidcontrol 1 ,
503.Xr xfd 1 ,
504.Xr xterm 1 ,
505.Xr catgets 3 ,
506.Xr gettext 3 Pq Pa devel/gettext ,
507.Xr nl_langinfo 3 ,
508.Xr setlocale 3
509.Sh BUGS
510This man page is incomplete.
511