xref: /dragonfly/share/man/man7/nls.7 (revision 3f5e28f4)
1.\"     $NetBSD: nls.7,v 1.11 2003/06/26 11:55:56 wiz Exp $
2.\"
3.\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Gregory McGarry.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"        This product includes software developed by the NetBSD
20.\"        Foundation, Inc. and its contributors.
21.\" 4. Neither the name of The NetBSD Foundation nor the names of its
22.\"    contributors may be used to endorse or promote products derived
23.\"    from this software without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35.\" POSSIBILITY OF SUCH DAMAGE.
36.\"
37.\" $DragonFly: src/share/man/man7/nls.7,v 1.5 2006/10/14 23:59:59 swildner Exp $
38.\"
39.Dd May 17, 2003
40.Dt NLS 7
41.Os
42.Sh NAME
43.Nm NLS
44.Nd Native Language Support Overview
45.Sh DESCRIPTION
46Native Language Support (NLS) provides commands for a single
47worldwide operating system base.
48An internationalized system has no built-in assumptions or dependencies
49on language-specific or cultural-specific conventions such as:
50.Pp
51.Bl -bullet -offset indent -compact
52.It
53Character classifications
54.It
55Character comparison rules
56.It
57Character collation order
58.It
59Numeric and monetary formatting
60.It
61Date and time formatting
62.It
63Message-text language
64.It
65Character sets
66.El
67.Pp
68All information pertaining to cultural conventions and language is
69obtained at program run time.
70.Pp
71.Dq Internationalization
72(often abbreviated
73.Dq i18n )
74refers to the operation by which system software is developed to support
75multiple cultural-specific and language-specific conventions.
76This is a generalization process by which the system is untied from
77calling only English strings or other English-specific conventions.
78.Dq Localization
79(often abbreviated
80.Dq l10n )
81refers to the operations by which the user environment is customized to
82handle its input and output appropriate for specific language and cultural
83conventions.
84This is a specialization process, by which generic methods already
85implemented in an internationalized system are used in specific ways.
86The formal description of cultural conventions for some country, together
87with all associated translations targeted to the native language, is
88called the
89.Dq locale .
90.Pp
91.Dx
92provides extensive support to programmers and system developers to
93enable internationalized software to be developed.
94.Dx
95also supplies a large variety of locales for system localization.
96.Ss Localization of Information
97All locale information is accessible to programs at run time so that
98data is processed and displayed correctly for specific cultural
99conventions and language.
100.Pp
101A locale is divided into categories.
102A category is a group of language-specific and culture-specific conventions
103as outlined in the list above.
104ISO C specifies the following six standard categories supported by
105.Dx :
106.Pp
107.Bl -tag -compact -width LC_MONETARYXX
108.It LC_COLLATE
109string-collation order information
110.It LC_CTYPE
111character classification, case conversion, and other character attributes
112.It LC_MESSAGES
113the format for affirmative and negative responses
114.It LC_MONETARY
115rules and symbols for formatting monetary numeric information
116.It LC_NUMERIC
117rules and symbols for formatting nonmonetary numeric information
118.It LC_TIME
119rules and symbols for formatting time and date information
120.El
121.Pp
122Localization of the system is achieved by setting appropriate values
123in environment variables to identify which locale should be used.
124The environment variables have the same names as their respective
125locale categories.
126Additionally, the
127.Ev LANG ,
128.Ev LC_ALL ,
129and
130.Ev NLSPATH
131environment variables are used.
132The
133.Ev NLSPATH
134environment variable specifies a colon-separated list of directory names
135where the message catalog files of the NLS database are located.
136The
137.Ev LC_ALL
138and
139.Ev LANG
140environment variables also determine the current locale.
141.Pp
142The values of these environment variables contains a string format as:
143.Pp
144.Bd -literal
145	language[_territory][.codeset][@modifier]
146.Ed
147.Pp
148Valid values for the language field come from the ISO639 standard which
149defines two-character codes for many languages.
150Some common language codes are:
151.Pp
152.nf
153.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
154\fILanguage Name\fP	\fICode\fP	\fILanguage Family\fP
155.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
156.sp 5p
157ABKHAZIAN	AB	IBERO-CAUCASIAN
158AFAN (OROMO)	OM	HAMITIC
159AFAR	AA	HAMITIC
160AFRIKAANS	AF	GERMANIC
161ALBANIAN	SQ	INDO-EUROPEAN (OTHER)
162AMHARIC	AM	SEMITIC
163ARABIC	AR	SEMITIC
164ARMENIAN	HY	INDO-EUROPEAN (OTHER)
165ASSAMESE	AS	INDIAN
166AYMARA	AY	AMERINDIAN
167AZERBAIJANI	AZ	TURKIC/ALTAIC
168BASHKIR	BA	TURKIC/ALTAIC
169BASQUE	EU	BASQUE
170BENGALI	BN	INDIAN
171BHUTANI	DZ	ASIAN
172BIHARI	BH	INDIAN
173BISLAMA	BI
174BRETON	BR	CELTIC
175BULGARIAN	BG	SLAVIC
176BURMESE	MY	ASIAN
177BYELORUSSIAN	BE	SLAVIC
178CAMBODIAN	KM	ASIAN
179CATALAN	CA	ROMANCE
180CHINESE	ZH	ASIAN
181CORSICAN	CO	ROMANCE
182CROATIAN	HR	SLAVIC
183CZECH	CS	SLAVIC
184DANISH	DA	GERMANIC
185DUTCH	NL	GERMANIC
186ENGLISH	EN	GERMANIC
187ESPERANTO	EO	INTERNATIONAL AUX.
188ESTONIAN	ET	FINNO-UGRIC
189FAROESE	FO	GERMANIC
190FIJI	FJ	OCEANIC/INDONESIAN
191FINNISH	FI	FINNO-UGRIC
192FRENCH	FR	ROMANCE
193FRISIAN	FY	GERMANIC
194GALICIAN	GL	ROMANCE
195GEORGIAN	KA	IBERO-CAUCASIAN
196GERMAN	DE	GERMANIC
197GREEK	EL	LATIN/GREEK
198GREENLANDIC	KL	ESKIMO
199GUARANI	GN	AMERINDIAN
200GUJARATI	GU	INDIAN
201HAUSA	HA	NEGRO-AFRICAN
202HEBREW	HE	SEMITIC
203HINDI	HI	INDIAN
204HUNGARIAN	HU	FINNO-UGRIC
205ICELANDIC	IS	GERMANIC
206INDONESIAN	ID	OCEANIC/INDONESIAN
207INTERLINGUA	IA	INTERNATIONAL AUX.
208INTERLINGUE	IE	INTERNATIONAL AUX.
209INUKTITUT	IU
210INUPIAK	IK	ESKIMO
211IRISH	GA	CELTIC
212ITALIAN	IT	ROMANCE
213JAPANESE	JA	ASIAN
214JAVANESE	JV	OCEANIC/INDONESIAN
215KANNADA	KN	DRAVIDIAN
216KASHMIRI	KS	INDIAN
217KAZAKH	KK	TURKIC/ALTAIC
218KINYARWANDA	RW	NEGRO-AFRICAN
219KIRGHIZ	KY	TURKIC/ALTAIC
220KURUNDI	RN	NEGRO-AFRICAN
221KOREAN	KO	ASIAN
222KURDISH	KU	IRANIAN
223LAOTHIAN	LO	ASIAN
224LATIN	LA	LATIN/GREEK
225LATVIAN	LV	BALTIC
226LINGALA	LN	NEGRO-AFRICAN
227LITHUANIAN	LT	BALTIC
228MACEDONIAN	MK	SLAVIC
229MALAGASY	MG	OCEANIC/INDONESIAN
230MALAY	MS	OCEANIC/INDONESIAN
231MALAYALAM	ML	DRAVIDIAN
232MALTESE	MT	SEMITIC
233MAORI	MI	OCEANIC/INDONESIAN
234MARATHI	MR	INDIAN
235MOLDAVIAN	MO	ROMANCE
236MONGOLIAN	MN
237NAURU	NA
238NEPALI	NE	INDIAN
239NORWEGIAN	NO	GERMANIC
240OCCITAN	OC	ROMANCE
241ORIYA	OR	INDIAN
242PASHTO	PS	IRANIAN
243PERSIAN (farsi)	FA	IRANIAN
244POLISH	PL	SLAVIC
245PORTUGUESE	PT	ROMANCE
246PUNJABI	PA	INDIAN
247QUECHUA	QU	AMERINDIAN
248RHAETO-ROMANCE  RM	ROMANCE
249ROMANIAN	RO	ROMANCE
250RUSSIAN	RU	SLAVIC
251SAMOAN	SM	OCEANIC/INDONESIAN
252SANGHO	SG	NEGRO-AFRICAN
253SANSKRIT	SA	INDIAN
254SCOTS GAELIC	GD	CELTIC
255SERBIAN	SR	SLAVIC
256SERBO-CROATIAN  SH	SLAVIC
257SESOTHO	ST	NEGRO-AFRICAN
258SETSWANA	TN	NEGRO-AFRICAN
259SHONA	SN	NEGRO-AFRICAN
260SINDHI	SD	INDIAN
261SINGHALESE	SI	INDIAN
262SISWATI	SS	NEGRO-AFRICAN
263SLOVAK	SK	SLAVIC
264SLOVENIAN	SL	SLAVIC
265SOMALI	SO	HAMITIC
266SPANISH	ES	ROMANCE
267SUNDANESE	SU	OCEANIC/INDONESIAN
268SWAHILI	SW	NEGRO-AFRICAN
269SWEDISH	SV	GERMANIC
270TAGALOG	TL	OCEANIC/INDONESIAN
271TAJIK	TG	IRANIAN
272TAMIL	TA	DRAVIDIAN
273TATAR	TT	TURKIC/ALTAIC
274TELUGU	TE	DRAVIDIAN
275THAI	TH	ASIAN
276TIBETAN	BO	ASIAN
277TIGRINYA	TI	SEMITIC
278TONGA	TO	OCEANIC/INDONESIAN
279TSONGA	TS	NEGRO-AFRICAN
280TURKISH	TR	TURKIC/ALTAIC
281TURKMEN	TK	TURKIC/ALTAIC
282TWI	TW	NEGRO-AFRICAN
283UIGUR	UG
284UKRAINIAN	UK	SLAVIC
285URDU	UR	INDIAN
286UZBEK	UZ	TURKIC/ALTAIC
287VIETNAMESE	VI	ASIAN
288VOLAPUK	VO	INTERNATIONAL AUX.
289WELSH	CY	CELTIC
290WOLOF	WO	NEGRO-AFRICAN
291XHOSA	XH	NEGRO-AFRICAN
292YIDDISH	YI	GERMANIC
293YORUBA	YO	NEGRO-AFRICAN
294ZHUANG	ZA
295ZULU	ZU	NEGRO-AFRICAN
296.ta
297.fi
298.Pp
299For example, the locale for the Danish language spoken in Denmark
300using the ISO8859-1 character set is da_DK.ISO8859-1.
301The da stands for the Danish language and the DK stands for Denmark.
302The short form of da_DK is sufficient to indicate this locale.
303.Pp
304The environment variable settings are queried by their priority level
305in the following manner:
306.Pp
307.Bl -bullet
308.It
309If the
310.Ev LC_ALL
311environment variable is set, all six categories use the locale it
312specifies.
313.It
314If the
315.Ev LC_ALL
316environment variable is not set, each individual category uses the
317locale specified by its corresponding environment variable.
318.It
319If the
320.Ev LC_ALL
321environment variable is not set, and a value for a particular
322.Ev LC_*
323environment variable is not set, the value of the
324.Ev LANG
325environment variable specifies the default locale for all categories.
326Only the
327.Ev LANG
328environment variable should be set in /etc/profile, since it makes it
329most easy for the user to override the system default using the individual
330.Ev LC_*
331variables.
332.It
333If the
334.Ev LC_ALL
335environment variable is not set, a value for a particular
336.Ev LC_*
337environment variable is not set, and the value of the
338.Ev LANG
339environment variable is not set, the locale for that specific
340category defaults to the C locale.
341The C or POSIX locale assumes the 7-bit ASCII character set and defines
342information for the six categories.
343.El
344.Ss Character Sets
345A character is any symbol used for the organization, control, or
346representation of data.
347A group of such symbols used to describe a
348particular language make up a character set.
349It is the encoding values in a character set that provide
350the interface between the system and its input and output devices.
351.Pp
352The following character sets are supported in
353.Dx
354.Bl -tag -width ISO8859_family
355.It ISO8859 family
356Industry-standard character sets are provided by means of the ISO8859
357family of character sets, which provide a range of single-byte character set
358support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew,
359Greek, and Turkish.
360The eucJP character set is the industry-standard character set used to support
361the Japanese locale.
362.It Unicode
363A Unicode environment based on the UTF-8 character set is supported for all
364supported language/territories.
365UTF-8 provides character support for most of the major languages of the
366world and can be used in environments where multiple languages must be
367processed simultaneously.
368.El
369.Ss Font Sets
370A font set contains the glyphs to be displayed on the screen for a
371corresponding character in a character set.
372A display must support a suitable font to display a character set.
373If suitable fonts are available to the X server, then X clients can
374include support for different character sets.
375.Xr xterm 1
376includes support for UTF-8 character sets.
377.Xr xfd 1
378is useful for displaying all the characters in an X font.
379.Pp
380The
381.Dx
382.Xr syscons 4
383console provides support for loading a variety of fonts using the
384.Xr vidcontrol 1
385utility. Available fonts can be found in
386.Pa /usr/share/syscons/fonts .
387.Ss Internationalization for Programmers
388To facilitate translations of messages into various languages and to
389make the translated messages available to the program based on a
390user's locale, it is necessary to keep messages separate from the
391programs and provide them in the form of message catalogs that a
392program can access at run time.
393.Pp
394Access to locale information is provided through the
395.Xr setlocale 3
396and
397.Xr nl_langinfo 3
398interfaces.
399See their respective man pages for further information.
400.Pp
401Message source files containing application messages are created by
402the programmer and converted to message catalogs.
403These catalogs are used by the application to retrieve and display
404messages, as needed.
405.Pp
406.Dx
407supports two message catalog interfaces: the X/Open
408.Xr catgets 3
409interface and the Uniforum
410.Xr gettext 3
411interface.
412The
413.Xr catgets 3
414interface has the advantage that it belongs to a standard which is
415well supported.
416Unfortunately the interface is complicated to use and
417maintenance of the catalogs is difficult.
418The implementation also doesn't support different character sets.
419The
420.Xr gettext 3
421interface has not been standardized yet, however it is being supported
422by an increasing number of systems.
423It also provides many additional tools which make programming and
424catalog maintenance much easier.
425.Ss Support for Multibyte Characters and Wide Characters
426Character sets with multibyte characters may be difficult to decode, or may
427contain state (i.e., adjacent characters are dependent).
428ISO C specifies a set of functions using 'wide characters' which can handle
429multibyte characters properly.
430A wide character is specified in ISO C
431as being a fixed number of bits wide and is stateless.
432.Pp
433There are two types for wide characters:
434.Em wchar_t
435and
436.Em wint_t .
437.Em wchar_t
438is a type which can contain one wide character and operates like 'char'
439type does for one character.
440.Em wint_t
441can contain one wide character or WEOF (wide EOF).
442.Pp
443There are functions that operate on
444.Em wchar_t ,
445and substitute for functions operating on 'char'.
446See
447.Xr wmemchr 3
448and
449.Xr towlower 3
450for details.
451There are some additional functions that operate on
452.Em wchar_t .
453See
454.Xr wctype 3
455and
456.Xr wctrans 3
457for details.
458.Pp
459Wide characters should be used for all I/O processing which may rely
460on locale-specific strings.
461The two primary issues requiring special use of wide characters are:
462.Bl -bullet -offset indent
463.It
464All I/O is performed using multibyte characters.
465Input data is converted into wide characters immediately after
466reading and data for output is converted from wide characters to
467multibyte characters immediately before writing.
468Conversion is achieved using
469.Xr mbstowcs 3 ,
470.Xr mbsrtowcs 3 ,
471.Xr wcstombs 3 ,
472.Xr wcsrtombs 3 ,
473.Xr mblen 3 ,
474.Xr mbrlen 3 ,
475and
476.Xr  mbsinit 3 .
477.It
478Wide characters are used directly for I/O, using
479.Xr getwchar 3 ,
480.Xr fgetwc 3 ,
481.Xr getwc 3 ,
482.Xr ungetwc 3 ,
483.Xr fgetws 3 ,
484.Xr putwchar 3 ,
485.Xr fputwc 3 ,
486.Xr putwc 3 ,
487and
488.Xr fputws 3 .
489They are also used for formatted I/O functions for wide characters
490such as
491.Xr fwscanf 3 ,
492.Xr wscanf 3 ,
493.Xr swscanf 3 ,
494.Xr fwprintf 3 ,
495.Xr wprintf 3 ,
496.Xr swprintf 3 ,
497.Xr vfwprintf 3 ,
498.Xr vwprintf 3 ,
499and
500.Xr vswprintf 3 ,
501and wide character identifier of %lc, %C, %ls, %S for conventional
502formatted I/O functions.
503.El
504.Sh SEE ALSO
505.Xr gencat 1 ,
506.Xr vidcontrol 1 ,
507.Xr xfd 1 ,
508.Xr xterm 1 ,
509.Xr catgets 3 ,
510.Xr gettext 3 ,
511.Xr nl_langinfo 3 ,
512.Xr setlocale 3
513.Sh BUGS
514This man page is incomplete.
515