xref: /minix/share/man/man7/nls.7 (revision 00e393ca)
1.\"     $NetBSD: nls.7,v 1.15 2009/04/09 02:51:54 joerg Exp $
2.\"
3.\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Gregory McGarry.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGE.
29.\"
30.Dd February 21, 2007
31.Dt NLS 7
32.Os
33.Sh NAME
34.Nm NLS
35.Nd Native Language Support Overview
36.Sh DESCRIPTION
37Native Language Support (NLS) provides commands for a single
38worldwide operating system base.
39An internationalized system has no built-in assumptions or dependencies
40on language-specific or cultural-specific conventions such as:
41.Pp
42.Bl -bullet -offset indent -compact
43.It
44Character classifications
45.It
46Character comparison rules
47.It
48Character collation order
49.It
50Numeric and monetary formatting
51.It
52Date and time formatting
53.It
54Message-text language
55.It
56Character sets
57.El
58.Pp
59All information pertaining to cultural conventions and language is
60obtained at program run time.
61.Pp
62.Dq Internationalization
63(often abbreviated
64.Dq i18n )
65refers to the operation by which system software is developed to support
66multiple cultural-specific and language-specific conventions.
67This is a generalization process by which the system is untied from
68calling only English strings or other English-specific conventions.
69.Dq Localization
70(often abbreviated
71.Dq l10n )
72refers to the operations by which the user environment is customized to
73handle its input and output appropriate for specific language and cultural
74conventions.
75This is a specialization process, by which generic methods already
76implemented in an internationalized system are used in specific ways.
77The formal description of cultural conventions for some country, together
78with all associated translations targeted to the native language, is
79called the
80.Dq locale .
81.Pp
82.Nx
83provides extensive support to programmers and system developers to
84enable internationalized software to be developed.
85.Nx
86also supplies a large variety of locales for system localization.
87.Ss Localization of Information
88All locale information is accessible to programs at run time so that
89data is processed and displayed correctly for specific cultural
90conventions and language.
91.Pp
92A locale is divided into categories.
93A category is a group of language-specific and culture-specific conventions
94as outlined in the list above.
95ISO C specifies the following six standard categories supported by
96.Nx :
97.Pp
98.Bl -tag -compact -width LC_MONETARYXX
99.It Ev LC_COLLATE
100string-collation order information
101.It Ev LC_CTYPE
102character classification, case conversion, and other character attributes
103.It Ev LC_MESSAGES
104the format for affirmative and negative responses
105.It Ev LC_MONETARY
106rules and symbols for formatting monetary numeric information
107.It Ev LC_NUMERIC
108rules and symbols for formatting nonmonetary numeric information
109.It Ev LC_TIME
110rules and symbols for formatting time and date information
111.El
112.Pp
113Localization of the system is achieved by setting appropriate values
114in environment variables to identify which locale should be used.
115The environment variables have the same names as their respective
116locale categories.
117Additionally, the
118.Ev LANG ,
119.Ev LC_ALL ,
120and
121.Ev NLSPATH
122environment variables are used.
123The
124.Ev NLSPATH
125environment variable specifies a colon-separated list of directory names
126where the message catalog files of the NLS database are located.
127The
128.Ev LC_ALL
129and
130.Ev LANG
131environment variables also determine the current locale.
132.Pp
133The values of these environment variables contains a string format as:
134.Pp
135.Bd -literal
136	language[_territory][.codeset][@modifier]
137.Ed
138.Pp
139Valid values for the language field come from the ISO639 standard which
140defines two-character codes for many languages.
141Some common language codes are:
142.Pp
143.Bl -column "PERSIAN (farsi)" "Sy Code" "OCEANIC/INDONESIAN"
144.It Sy Language Name Ta Sy Code Ta Sy Language Family
145.It ABKHAZIAN	AB	IBERO-CAUCASIAN
146.It AFAN (OROMO)	OM	HAMITIC
147.It AFAR	AA	HAMITIC
148.It AFRIKAANS	AF	GERMANIC
149.It ALBANIAN	SQ	INDO-EUROPEAN (OTHER)
150.It AMHARIC	AM	SEMITIC
151.It ARABIC	AR	SEMITIC
152.It ARMENIAN	HY	INDO-EUROPEAN (OTHER)
153.It ASSAMESE	AS	INDIAN
154.It AYMARA	AY	AMERINDIAN
155.It AZERBAIJANI	AZ	TURKIC/ALTAIC
156.It BASHKIR	BA	TURKIC/ALTAIC
157.It BASQUE	EU	BASQUE
158.It BENGALI	BN	INDIAN
159.It BHUTANI	DZ	ASIAN
160.It BIHARI	BH	INDIAN
161.It BISLAMA     Ta BI   Ta ""
162.It BRETON	BR	CELTIC
163.It BULGARIAN	BG	SLAVIC
164.It BURMESE	MY	ASIAN
165.It BYELORUSSIAN	BE	SLAVIC
166.It CAMBODIAN	KM	ASIAN
167.It CATALAN	CA	ROMANCE
168.It CHINESE	ZH	ASIAN
169.It CORSICAN	CO	ROMANCE
170.It CROATIAN	HR	SLAVIC
171.It CZECH	CS	SLAVIC
172.It DANISH	DA	GERMANIC
173.It DUTCH	NL	GERMANIC
174.It ENGLISH	EN	GERMANIC
175.It ESPERANTO	EO	INTERNATIONAL AUX.
176.It ESTONIAN	ET	FINNO-UGRIC
177.It FAROESE	FO	GERMANIC
178.It FIJI	FJ	OCEANIC/INDONESIAN
179.It FINNISH	FI	FINNO-UGRIC
180.It FRENCH	FR	ROMANCE
181.It FRISIAN	FY	GERMANIC
182.It GALICIAN	GL	ROMANCE
183.It GEORGIAN	KA	IBERO-CAUCASIAN
184.It GERMAN	DE	GERMANIC
185.It GREEK	EL	LATIN/GREEK
186.It GREENLANDIC	KL	ESKIMO
187.It GUARANI	GN	AMERINDIAN
188.It GUJARATI	GU	INDIAN
189.It HAUSA	HA	NEGRO-AFRICAN
190.It HEBREW	HE	SEMITIC
191.It HINDI	HI	INDIAN
192.It HUNGARIAN	HU	FINNO-UGRIC
193.It ICELANDIC	IS	GERMANIC
194.It INDONESIAN	ID	OCEANIC/INDONESIAN
195.It INTERLINGUA	IA	INTERNATIONAL AUX.
196.It INTERLINGUE	IE	INTERNATIONAL AUX.
197.It INUKTITUT   Ta IU   Ta ""
198.It INUPIAK	IK	ESKIMO
199.It IRISH	GA	CELTIC
200.It ITALIAN	IT	ROMANCE
201.It JAPANESE	JA	ASIAN
202.It JAVANESE	JV	OCEANIC/INDONESIAN
203.It KANNADA	KN	DRAVIDIAN
204.It KASHMIRI	KS	INDIAN
205.It KAZAKH	KK	TURKIC/ALTAIC
206.It KINYARWANDA	RW	NEGRO-AFRICAN
207.It KIRGHIZ	KY	TURKIC/ALTAIC
208.It KURUNDI	RN	NEGRO-AFRICAN
209.It KOREAN	KO	ASIAN
210.It KURDISH	KU	IRANIAN
211.It LAOTHIAN	LO	ASIAN
212.It LATIN	LA	LATIN/GREEK
213.It LATVIAN	LV	BALTIC
214.It LINGALA	LN	NEGRO-AFRICAN
215.It LITHUANIAN	LT	BALTIC
216.It MACEDONIAN	MK	SLAVIC
217.It MALAGASY	MG	OCEANIC/INDONESIAN
218.It MALAY	MS	OCEANIC/INDONESIAN
219.It MALAYALAM	ML	DRAVIDIAN
220.It MALTESE	MT	SEMITIC
221.It MAORI	MI	OCEANIC/INDONESIAN
222.It MARATHI	MR	INDIAN
223.It MOLDAVIAN	MO	ROMANCE
224.It MONGOLIAN   Ta MN   Ta ""
225.It NAURU       Ta NA   Ta ""
226.It NEPALI	NE	INDIAN
227.It NORWEGIAN	NO	GERMANIC
228.It OCCITAN	OC	ROMANCE
229.It ORIYA	OR	INDIAN
230.It PASHTO	PS	IRANIAN
231.It PERSIAN (farsi)	FA	IRANIAN
232.It POLISH	PL	SLAVIC
233.It PORTUGUESE	PT	ROMANCE
234.It PUNJABI	PA	INDIAN
235.It QUECHUA	QU	AMERINDIAN
236.It RHAETO-ROMANCE	RM	ROMANCE
237.It ROMANIAN	RO	ROMANCE
238.It RUSSIAN	RU	SLAVIC
239.It SAMOAN	SM	OCEANIC/INDONESIAN
240.It SANGHO	SG	NEGRO-AFRICAN
241.It SANSKRIT	SA	INDIAN
242.It SCOTS GAELIC	GD	CELTIC
243.It SERBIAN	SR	SLAVIC
244.It SERBO-CROATIAN	SH	SLAVIC
245.It SESOTHO	ST	NEGRO-AFRICAN
246.It SETSWANA	TN	NEGRO-AFRICAN
247.It SHONA	SN	NEGRO-AFRICAN
248.It SINDHI	SD	INDIAN
249.It SINGHALESE	SI	INDIAN
250.It SISWATI	SS	NEGRO-AFRICAN
251.It SLOVAK	SK	SLAVIC
252.It SLOVENIAN	SL	SLAVIC
253.It SOMALI	SO	HAMITIC
254.It SPANISH	ES	ROMANCE
255.It SUNDANESE	SU	OCEANIC/INDONESIAN
256.It SWAHILI	SW	NEGRO-AFRICAN
257.It SWEDISH	SV	GERMANIC
258.It TAGALOG	TL	OCEANIC/INDONESIAN
259.It TAJIK	TG	IRANIAN
260.It TAMIL	TA	DRAVIDIAN
261.It TATAR	TT	TURKIC/ALTAIC
262.It TELUGU	TE	DRAVIDIAN
263.It THAI	TH	ASIAN
264.It TIBETAN	BO	ASIAN
265.It TIGRINYA	TI	SEMITIC
266.It TONGA	TO	OCEANIC/INDONESIAN
267.It TSONGA	TS	NEGRO-AFRICAN
268.It TURKISH	TR	TURKIC/ALTAIC
269.It TURKMEN	TK	TURKIC/ALTAIC
270.It TWI	TW	NEGRO-AFRICAN
271.It UIGUR       Ta UG   Ta ""
272.It UKRAINIAN	UK	SLAVIC
273.It URDU	UR	INDIAN
274.It UZBEK	UZ	TURKIC/ALTAIC
275.It VIETNAMESE	VI	ASIAN
276.It VOLAPUK	VO	INTERNATIONAL AUX.
277.It WELSH	CY	CELTIC
278.It WOLOF	WO	NEGRO-AFRICAN
279.It XHOSA	XH	NEGRO-AFRICAN
280.It YIDDISH	YI	GERMANIC
281.It YORUBA	YO	NEGRO-AFRICAN
282.It ZHUANG      Ta ZA   Ta ""
283.It ZULU	ZU	NEGRO-AFRICAN
284.El
285.Pp
286For example, the locale for the Danish language spoken in Denmark
287using the ISO 8859-1 character set is da_DK.ISO8859-1.
288The da stands for the Danish language and the DK stands for Denmark.
289The short form of da_DK is sufficient to indicate this locale.
290.Pp
291The environment variable settings are queried by their priority level
292in the following manner:
293.Pp
294.Bl -bullet
295.It
296If the
297.Ev LC_ALL
298environment variable is set, all six categories use the locale it
299specifies.
300.It
301If the
302.Ev LC_ALL
303environment variable is not set, each individual category uses the
304locale specified by its corresponding environment variable.
305.It
306If the
307.Ev LC_ALL
308environment variable is not set, and a value for a particular
309.Ev LC_*
310environment variable is not set, the value of the
311.Ev LANG
312environment variable specifies the default locale for all categories.
313Only the
314.Ev LANG
315environment variable should be set in /etc/profile, since it makes it
316most easy for the user to override the system default using the individual
317.Ev LC_*
318variables.
319.It
320If the
321.Ev LC_ALL
322environment variable is not set, a value for a particular
323.Ev LC_*
324environment variable is not set, and the value of the
325.Ev LANG
326environment variable is not set, the locale for that specific
327category defaults to the C locale.
328The C or POSIX locale assumes the ASCII character set and defines
329information for the six categories.
330.El
331.Ss Character Sets
332A character is any symbol used for the organization, control, or
333representation of data.
334A group of such symbols used to describe a
335particular language make up a character set.
336It is the encoding values in a character set that provide
337the interface between the system and its input and output devices.
338.Pp
339The following character sets are supported in
340.Nx :
341.Bl -tag -width ISO_8859_family
342.It ASCII
343The American Standard Code for Information Exchange (ASCII) standard
344specifies 128 Roman characters and control codes, encoded in a 7-bit
345character encoding scheme.
346.It ISO 8859 family
347Industry-standard character sets specified by the ISO/IEC 8859
348standard.
349The standard is divided into 15 numbered parts, with each
350part specifying broad script similarities.
351Examples include Western European, Central European, Arabic, Cyrillic,
352Hebrew, Greek, and Turkish.
353The character sets use an 8-bit character encoding scheme which is
354compatible with the ASCII character set.
355.It Unicode
356The Unicode character set is the full set of known abstract characters of
357all real-world scripts.  It can be used in environments where multiple
358scripts must be processed simultaneously.
359Unicode is compatible with ISO 8859-1 (Western European) and ASCII.
360Many character encoding schemes are available for Unicode, including UTF-8,
361UTF-16 and UTF-32.
362These encoding schemes are multi-byte encodings.
363The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is
364compatible with ASCII.
365The UTF-16 encoding scheme uses 16-bit, variable-width encodings.
366The UTF-32 encoding scheme using 32-bit, fixed-width encodings.
367.El
368.Ss Font Sets
369A font set contains the glyphs to be displayed on the screen for a
370corresponding character in a character set.
371A display must support a suitable font to display a character set.
372If suitable fonts are available to the X server, then X clients can
373include support for different character sets.
374.Xr xterm 1
375includes support for Unicode with UTF-8 encoding.
376.Xr xfd 1
377is useful for displaying all the characters in an X font.
378.Pp
379The
380.Nx
381.Xr wscons 4
382console provides support for loading fonts using the
383.Xr wsfontload 8
384utility.
385Currently, only fonts for the ISO8859-1 family of character sets are
386supported.
387.Ss Internationalization for Programmers
388To facilitate translations of messages into various languages and to
389make the translated messages available to the program based on a
390user's locale, it is necessary to keep messages separate from the
391programs and provide them in the form of message catalogs that a
392program can access at run time.
393.Pp
394Access to locale information is provided through the
395.Xr setlocale 3
396and
397.Xr nl_langinfo 3
398interfaces.
399See their respective man pages for further information.
400.Pp
401Message source files containing application messages are created by
402the programmer and converted to message catalogs.
403These catalogs are used by the application to retrieve and display
404messages, as needed.
405.Pp
406.Nx
407supports two message catalog interfaces: the X/Open
408.Xr catgets 3
409interface and the Uniforum
410.Xr gettext 3
411interface.
412The
413.Xr catgets 3
414interface has the advantage that it belongs to a standard which is
415well supported.
416Unfortunately the interface is complicated to use and
417maintenance of the catalogs is difficult.
418The implementation also doesn't support different character sets.
419The
420.Xr gettext 3
421interface has not been standardized yet, however it is being supported
422by an increasing number of systems.
423It also provides many additional tools which make programming and
424catalog maintenance much easier.
425.Ss Support for Multi-byte Encodings
426Some character sets with multi-byte encodings may be difficult to decode,
427or may contain state (i.e., adjacent characters are dependent).
428ISO C specifies a set of functions using 'wide characters' which can handle
429multi-byte encodings properly.
430The behaviour of these functions is affected
431by the
432.Ev LC_CTYPE
433category of the current locale.
434.Pp
435A wide character is specified in ISO C
436as being a fixed number of bits wide and is stateless.
437There are two types for wide characters:
438.Em wchar_t
439and
440.Em wint_t .
441.Em wchar_t
442is a type which can contain one wide character and operates like 'char'
443type does for one character.
444.Em wint_t
445can contain one wide character or WEOF (wide EOF).
446.Pp
447There are functions that operate on
448.Em wchar_t ,
449and substitute for functions operating on 'char'.
450See
451.Xr wmemchr 3
452and
453.Xr towlower 3
454for details.
455There are some additional functions that operate on
456.Em wchar_t .
457See
458.Xr wctype 3
459and
460.Xr wctrans 3
461for details.
462.Pp
463Wide characters should be used for all I/O processing which may rely
464on locale-specific strings.
465The two primary issues requiring special use of wide characters are:
466.Bl -bullet -offset indent
467.It
468All I/O is performed using multibyte characters.
469Input data is converted into wide characters immediately after
470reading and data for output is converted from wide characters to
471multi-byte encoding immediately before writing.
472Conversion is controlled by the
473.Xr mbstowcs 3 ,
474.Xr mbsrtowcs 3 ,
475.Xr wcstombs 3 ,
476.Xr wcsrtombs 3 ,
477.Xr mblen 3 ,
478.Xr mbrlen 3 ,
479and
480.Xr  mbsinit 3 .
481.It
482Wide characters are used directly for I/O, using
483.Xr getwchar 3 ,
484.Xr fgetwc 3 ,
485.Xr getwc 3 ,
486.Xr ungetwc 3 ,
487.Xr fgetws 3 ,
488.Xr putwchar 3 ,
489.Xr fputwc 3 ,
490.Xr putwc 3 ,
491and
492.Xr fputws 3 .
493They are also used for formatted I/O functions for wide characters
494such as
495.Xr fwscanf 3 ,
496.Xr wscanf 3 ,
497.Xr swscanf 3 ,
498.Xr fwprintf 3 ,
499.Xr wprintf 3 ,
500.Xr swprintf 3 ,
501.Xr vfwprintf 3 ,
502.Xr vwprintf 3 ,
503and
504.Xr vswprintf 3 ,
505and wide character identifier of %lc, %C, %ls, %S for conventional
506formatted I/O functions.
507.El
508.Sh SEE ALSO
509.Xr gencat 1 ,
510.Xr xfd 1 ,
511.Xr xterm 1 ,
512.Xr catgets 3 ,
513.Xr gettext 3 ,
514.Xr nl_langinfo 3 ,
515.Xr setlocale 3 ,
516.Xr wsfontload 8
517.Sh BUGS
518This man page is incomplete.
519