xref: /dragonfly/share/man/man7/nls.7 (revision 2038fb68)
1.\"     $NetBSD: nls.7,v 1.11 2003/06/26 11:55:56 wiz Exp $
2.\"
3.\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Gregory McGarry.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\" 3. All advertising materials mentioning features or use of this software
18.\"    must display the following acknowledgement:
19.\"        This product includes software developed by the NetBSD
20.\"        Foundation, Inc. and its contributors.
21.\" 4. Neither the name of The NetBSD Foundation nor the names of its
22.\"    contributors may be used to endorse or promote products derived
23.\"    from this software without specific prior written permission.
24.\"
25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
35.\" POSSIBILITY OF SUCH DAMAGE.
36.\"
37.\" $DragonFly: src/share/man/man7/nls.7,v 1.7 2008/05/02 02:05:06 swildner Exp $
38.\"
39.Dd May 17, 2003
40.Dt NLS 7
41.Os
42.Sh NAME
43.Nm NLS
44.Nd Native Language Support Overview
45.Sh DESCRIPTION
46Native Language Support (NLS) provides commands for a single
47worldwide operating system base.
48An internationalized system has no built-in assumptions or dependencies
49on language-specific or cultural-specific conventions such as:
50.Pp
51.Bl -bullet -offset indent -compact
52.It
53Character classifications
54.It
55Character comparison rules
56.It
57Character collation order
58.It
59Numeric and monetary formatting
60.It
61Date and time formatting
62.It
63Message-text language
64.It
65Character sets
66.El
67.Pp
68All information pertaining to cultural conventions and language is
69obtained at program run time.
70.Pp
71.Dq Internationalization
72(often abbreviated
73.Dq i18n )
74refers to the operation by which system software is developed to support
75multiple cultural-specific and language-specific conventions.
76This is a generalization process by which the system is untied from
77calling only English strings or other English-specific conventions.
78.Dq Localization
79(often abbreviated
80.Dq l10n )
81refers to the operations by which the user environment is customized to
82handle its input and output appropriate for specific language and cultural
83conventions.
84This is a specialization process, by which generic methods already
85implemented in an internationalized system are used in specific ways.
86The formal description of cultural conventions for some country, together
87with all associated translations targeted to the native language, is
88called the
89.Dq locale .
90.Pp
91.Dx
92provides extensive support to programmers and system developers to
93enable internationalized software to be developed.
94.Dx
95also supplies a large variety of locales for system localization.
96.Ss Localization of Information
97All locale information is accessible to programs at run time so that
98data is processed and displayed correctly for specific cultural
99conventions and language.
100.Pp
101A locale is divided into categories.
102A category is a group of language-specific and culture-specific conventions
103as outlined in the list above.
104ISO C specifies the following six standard categories supported by
105.Dx :
106.Pp
107.Bl -tag -compact -width LC_MONETARYXX
108.It LC_COLLATE
109string-collation order information
110.It LC_CTYPE
111character classification, case conversion, and other character attributes
112.It LC_MESSAGES
113the format for affirmative and negative responses
114.It LC_MONETARY
115rules and symbols for formatting monetary numeric information
116.It LC_NUMERIC
117rules and symbols for formatting nonmonetary numeric information
118.It LC_TIME
119rules and symbols for formatting time and date information
120.El
121.Pp
122Localization of the system is achieved by setting appropriate values
123in environment variables to identify which locale should be used.
124The environment variables have the same names as their respective
125locale categories.
126Additionally, the
127.Ev LANG ,
128.Ev LC_ALL ,
129and
130.Ev NLSPATH
131environment variables are used.
132The
133.Ev NLSPATH
134environment variable specifies a colon-separated list of directory names
135where the message catalog files of the NLS database are located.
136The
137.Ev LC_ALL
138and
139.Ev LANG
140environment variables also determine the current locale.
141.Pp
142The values of these environment variables contains a string format as:
143.Bd -literal
144	language[_territory][.codeset][@modifier]
145.Ed
146.Pp
147Valid values for the language field come from the ISO639 standard which
148defines two-character codes for many languages.
149Some common language codes are:
150.Pp
151.nf
152.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
153\fILanguage Name\fP	\fICode\fP	\fILanguage Family\fP
154.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
155.sp 5p
156ABKHAZIAN	AB	IBERO-CAUCASIAN
157AFAN (OROMO)	OM	HAMITIC
158AFAR	AA	HAMITIC
159AFRIKAANS	AF	GERMANIC
160ALBANIAN	SQ	INDO-EUROPEAN (OTHER)
161AMHARIC	AM	SEMITIC
162ARABIC	AR	SEMITIC
163ARMENIAN	HY	INDO-EUROPEAN (OTHER)
164ASSAMESE	AS	INDIAN
165AYMARA	AY	AMERINDIAN
166AZERBAIJANI	AZ	TURKIC/ALTAIC
167BASHKIR	BA	TURKIC/ALTAIC
168BASQUE	EU	BASQUE
169BENGALI	BN	INDIAN
170BHUTANI	DZ	ASIAN
171BIHARI	BH	INDIAN
172BISLAMA	BI
173BRETON	BR	CELTIC
174BULGARIAN	BG	SLAVIC
175BURMESE	MY	ASIAN
176BYELORUSSIAN	BE	SLAVIC
177CAMBODIAN	KM	ASIAN
178CATALAN	CA	ROMANCE
179CHINESE	ZH	ASIAN
180CORSICAN	CO	ROMANCE
181CROATIAN	HR	SLAVIC
182CZECH	CS	SLAVIC
183DANISH	DA	GERMANIC
184DUTCH	NL	GERMANIC
185ENGLISH	EN	GERMANIC
186ESPERANTO	EO	INTERNATIONAL AUX.
187ESTONIAN	ET	FINNO-UGRIC
188FAROESE	FO	GERMANIC
189FIJI	FJ	OCEANIC/INDONESIAN
190FINNISH	FI	FINNO-UGRIC
191FRENCH	FR	ROMANCE
192FRISIAN	FY	GERMANIC
193GALICIAN	GL	ROMANCE
194GEORGIAN	KA	IBERO-CAUCASIAN
195GERMAN	DE	GERMANIC
196GREEK	EL	LATIN/GREEK
197GREENLANDIC	KL	ESKIMO
198GUARANI	GN	AMERINDIAN
199GUJARATI	GU	INDIAN
200HAUSA	HA	NEGRO-AFRICAN
201HEBREW	HE	SEMITIC
202HINDI	HI	INDIAN
203HUNGARIAN	HU	FINNO-UGRIC
204ICELANDIC	IS	GERMANIC
205INDONESIAN	ID	OCEANIC/INDONESIAN
206INTERLINGUA	IA	INTERNATIONAL AUX.
207INTERLINGUE	IE	INTERNATIONAL AUX.
208INUKTITUT	IU
209INUPIAK	IK	ESKIMO
210IRISH	GA	CELTIC
211ITALIAN	IT	ROMANCE
212JAPANESE	JA	ASIAN
213JAVANESE	JV	OCEANIC/INDONESIAN
214KANNADA	KN	DRAVIDIAN
215KASHMIRI	KS	INDIAN
216KAZAKH	KK	TURKIC/ALTAIC
217KINYARWANDA	RW	NEGRO-AFRICAN
218KIRGHIZ	KY	TURKIC/ALTAIC
219KURUNDI	RN	NEGRO-AFRICAN
220KOREAN	KO	ASIAN
221KURDISH	KU	IRANIAN
222LAOTHIAN	LO	ASIAN
223LATIN	LA	LATIN/GREEK
224LATVIAN	LV	BALTIC
225LINGALA	LN	NEGRO-AFRICAN
226LITHUANIAN	LT	BALTIC
227MACEDONIAN	MK	SLAVIC
228MALAGASY	MG	OCEANIC/INDONESIAN
229MALAY	MS	OCEANIC/INDONESIAN
230MALAYALAM	ML	DRAVIDIAN
231MALTESE	MT	SEMITIC
232MAORI	MI	OCEANIC/INDONESIAN
233MARATHI	MR	INDIAN
234MOLDAVIAN	MO	ROMANCE
235MONGOLIAN	MN
236NAURU	NA
237NEPALI	NE	INDIAN
238NORWEGIAN	NO	GERMANIC
239OCCITAN	OC	ROMANCE
240ORIYA	OR	INDIAN
241PASHTO	PS	IRANIAN
242PERSIAN (farsi)	FA	IRANIAN
243POLISH	PL	SLAVIC
244PORTUGUESE	PT	ROMANCE
245PUNJABI	PA	INDIAN
246QUECHUA	QU	AMERINDIAN
247RHAETO-ROMANCE  RM	ROMANCE
248ROMANIAN	RO	ROMANCE
249RUSSIAN	RU	SLAVIC
250SAMOAN	SM	OCEANIC/INDONESIAN
251SANGHO	SG	NEGRO-AFRICAN
252SANSKRIT	SA	INDIAN
253SCOTS GAELIC	GD	CELTIC
254SERBIAN	SR	SLAVIC
255SERBO-CROATIAN  SH	SLAVIC
256SESOTHO	ST	NEGRO-AFRICAN
257SETSWANA	TN	NEGRO-AFRICAN
258SHONA	SN	NEGRO-AFRICAN
259SINDHI	SD	INDIAN
260SINGHALESE	SI	INDIAN
261SISWATI	SS	NEGRO-AFRICAN
262SLOVAK	SK	SLAVIC
263SLOVENIAN	SL	SLAVIC
264SOMALI	SO	HAMITIC
265SPANISH	ES	ROMANCE
266SUNDANESE	SU	OCEANIC/INDONESIAN
267SWAHILI	SW	NEGRO-AFRICAN
268SWEDISH	SV	GERMANIC
269TAGALOG	TL	OCEANIC/INDONESIAN
270TAJIK	TG	IRANIAN
271TAMIL	TA	DRAVIDIAN
272TATAR	TT	TURKIC/ALTAIC
273TELUGU	TE	DRAVIDIAN
274THAI	TH	ASIAN
275TIBETAN	BO	ASIAN
276TIGRINYA	TI	SEMITIC
277TONGA	TO	OCEANIC/INDONESIAN
278TSONGA	TS	NEGRO-AFRICAN
279TURKISH	TR	TURKIC/ALTAIC
280TURKMEN	TK	TURKIC/ALTAIC
281TWI	TW	NEGRO-AFRICAN
282UIGUR	UG
283UKRAINIAN	UK	SLAVIC
284URDU	UR	INDIAN
285UZBEK	UZ	TURKIC/ALTAIC
286VIETNAMESE	VI	ASIAN
287VOLAPUK	VO	INTERNATIONAL AUX.
288WELSH	CY	CELTIC
289WOLOF	WO	NEGRO-AFRICAN
290XHOSA	XH	NEGRO-AFRICAN
291YIDDISH	YI	GERMANIC
292YORUBA	YO	NEGRO-AFRICAN
293ZHUANG	ZA
294ZULU	ZU	NEGRO-AFRICAN
295.ta
296.fi
297.Pp
298For example, the locale for the Danish language spoken in Denmark
299using the ISO8859-1 character set is da_DK.ISO8859-1.
300The da stands for the Danish language and the DK stands for Denmark.
301The short form of da_DK is sufficient to indicate this locale.
302.Pp
303The environment variable settings are queried by their priority level
304in the following manner:
305.Bl -bullet
306.It
307If the
308.Ev LC_ALL
309environment variable is set, all six categories use the locale it
310specifies.
311.It
312If the
313.Ev LC_ALL
314environment variable is not set, each individual category uses the
315locale specified by its corresponding environment variable.
316.It
317If the
318.Ev LC_ALL
319environment variable is not set, and a value for a particular
320.Ev LC_*
321environment variable is not set, the value of the
322.Ev LANG
323environment variable specifies the default locale for all categories.
324Only the
325.Ev LANG
326environment variable should be set in /etc/profile, since it makes it
327most easy for the user to override the system default using the individual
328.Ev LC_*
329variables.
330.It
331If the
332.Ev LC_ALL
333environment variable is not set, a value for a particular
334.Ev LC_*
335environment variable is not set, and the value of the
336.Ev LANG
337environment variable is not set, the locale for that specific
338category defaults to the C locale.
339The C or POSIX locale assumes the 7-bit ASCII character set and defines
340information for the six categories.
341.El
342.Ss Character Sets
343A character is any symbol used for the organization, control, or
344representation of data.
345A group of such symbols used to describe a
346particular language make up a character set.
347It is the encoding values in a character set that provide
348the interface between the system and its input and output devices.
349.Pp
350The following character sets are supported in
351.Dx
352.Bl -tag -width ISO8859_family
353.It ISO8859 family
354Industry-standard character sets are provided by means of the ISO8859
355family of character sets, which provide a range of single-byte character set
356support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew,
357Greek, and Turkish.
358The eucJP character set is the industry-standard character set used to support
359the Japanese locale.
360.It Unicode
361A Unicode environment based on the UTF-8 character set is supported for all
362supported language/territories.
363UTF-8 provides character support for most of the major languages of the
364world and can be used in environments where multiple languages must be
365processed simultaneously.
366.El
367.Ss Font Sets
368A font set contains the glyphs to be displayed on the screen for a
369corresponding character in a character set.
370A display must support a suitable font to display a character set.
371If suitable fonts are available to the X server, then X clients can
372include support for different character sets.
373.Xr xterm 1
374includes support for UTF-8 character sets.
375.Xr xfd 1
376is useful for displaying all the characters in an X font.
377.Pp
378The
379.Dx
380.Xr syscons 4
381console provides support for loading a variety of fonts using the
382.Xr vidcontrol 1
383utility. Available fonts can be found in
384.Pa /usr/share/syscons/fonts .
385.Ss Internationalization for Programmers
386To facilitate translations of messages into various languages and to
387make the translated messages available to the program based on a
388user's locale, it is necessary to keep messages separate from the
389programs and provide them in the form of message catalogs that a
390program can access at run time.
391.Pp
392Access to locale information is provided through the
393.Xr setlocale 3
394and
395.Xr nl_langinfo 3
396interfaces.
397See their respective man pages for further information.
398.Pp
399Message source files containing application messages are created by
400the programmer and converted to message catalogs.
401These catalogs are used by the application to retrieve and display
402messages, as needed.
403.Pp
404.Dx
405supports two message catalog interfaces: the X/Open
406.Xr catgets 3
407interface and the Uniforum
408.Xr gettext 3
409interface.
410The
411.Xr catgets 3
412interface has the advantage that it belongs to a standard which is
413well supported.
414Unfortunately the interface is complicated to use and
415maintenance of the catalogs is difficult.
416The implementation also doesn't support different character sets.
417The
418.Xr gettext 3
419interface has not been standardized yet, however it is being supported
420by an increasing number of systems.
421It also provides many additional tools which make programming and
422catalog maintenance much easier.
423.Ss Support for Multibyte Characters and Wide Characters
424Character sets with multibyte characters may be difficult to decode, or may
425contain state (i.e., adjacent characters are dependent).
426ISO C specifies a set of functions using 'wide characters' which can handle
427multibyte characters properly.
428A wide character is specified in ISO C
429as being a fixed number of bits wide and is stateless.
430.Pp
431There are two types for wide characters:
432.Em wchar_t
433and
434.Em wint_t .
435.Em wchar_t
436is a type which can contain one wide character and operates like 'char'
437type does for one character.
438.Em wint_t
439can contain one wide character or WEOF (wide EOF).
440.Pp
441There are functions that operate on
442.Em wchar_t ,
443and substitute for functions operating on 'char'.
444See
445.Xr wmemchr 3
446and
447.Xr towlower 3
448for details.
449There are some additional functions that operate on
450.Em wchar_t .
451See
452.Xr wctype 3
453and
454.Xr wctrans 3
455for details.
456.Pp
457Wide characters should be used for all I/O processing which may rely
458on locale-specific strings.
459The two primary issues requiring special use of wide characters are:
460.Bl -bullet -offset indent
461.It
462All I/O is performed using multibyte characters.
463Input data is converted into wide characters immediately after
464reading and data for output is converted from wide characters to
465multibyte characters immediately before writing.
466Conversion is achieved using
467.Xr mbstowcs 3 ,
468.Xr mbsrtowcs 3 ,
469.Xr wcstombs 3 ,
470.Xr wcsrtombs 3 ,
471.Xr mblen 3 ,
472.Xr mbrlen 3 ,
473and
474.Xr  mbsinit 3 .
475.It
476Wide characters are used directly for I/O, using
477.Xr getwchar 3 ,
478.Xr fgetwc 3 ,
479.Xr getwc 3 ,
480.Xr ungetwc 3 ,
481.Xr fgetws 3 ,
482.Xr putwchar 3 ,
483.Xr fputwc 3 ,
484.Xr putwc 3 ,
485and
486.Xr fputws 3 .
487They are also used for formatted I/O functions for wide characters
488such as
489.Xr fwscanf 3 ,
490.Xr wscanf 3 ,
491.Xr swscanf 3 ,
492.Xr fwprintf 3 ,
493.Xr wprintf 3 ,
494.Xr swprintf 3 ,
495.Xr vfwprintf 3 ,
496.Xr vwprintf 3 ,
497and
498.Xr vswprintf 3 ,
499and wide character identifier of %lc, %C, %ls, %S for conventional
500formatted I/O functions.
501.El
502.Sh SEE ALSO
503.Xr gencat 1 ,
504.Xr vidcontrol 1 ,
505.Xr xfd 1 ,
506.Xr xterm 1 ,
507.Xr catgets 3 ,
508.Xr gettext 3 Pq Pa pkgsrc/devel/gettext ,
509.Xr nl_langinfo 3 ,
510.Xr setlocale 3
511.Sh BUGS
512This man page is incomplete.
513