xref: /dragonfly/share/man/man7/nls.7 (revision 8af44722)
1.\"     $NetBSD: nls.7,v 1.15 2009/04/09 02:51:54 joerg Exp $
2.\"
3.\" Copyright (c) 2003 The NetBSD Foundation, Inc.
4.\" All rights reserved.
5.\"
6.\" This code is derived from software contributed to The NetBSD Foundation
7.\" by Gregory McGarry.
8.\"
9.\" Redistribution and use in source and binary forms, with or without
10.\" modification, are permitted provided that the following conditions
11.\" are met:
12.\" 1. Redistributions of source code must retain the above copyright
13.\"    notice, this list of conditions and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21.\" PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGE.
29.\"
30.Dd November 24, 2013
31.Dt NLS 7
32.Os
33.Sh NAME
34.Nm NLS
35.Nd Native Language Support Overview
36.Sh DESCRIPTION
37Native Language Support (NLS) provides commands for a single
38worldwide operating system base.
39An internationalized system has no built-in assumptions or dependencies
40on language-specific or cultural-specific conventions such as:
41.Pp
42.Bl -bullet -offset indent -compact
43.It
44Character classifications
45.It
46Character comparison rules
47.It
48Character collation order
49.It
50Numeric and monetary formatting
51.It
52Date and time formatting
53.It
54Message-text language
55.It
56Character sets
57.El
58.Pp
59All information pertaining to cultural conventions and language is
60obtained at program run time.
61.Pp
62.Dq Internationalization
63(often abbreviated
64.Dq i18n )
65refers to the operation by which system software is developed to support
66multiple cultural-specific and language-specific conventions.
67This is a generalization process by which the system is untied from
68calling only English strings or other English-specific conventions.
69.Dq Localization
70(often abbreviated
71.Dq l10n )
72refers to the operations by which the user environment is customized to
73handle its input and output appropriate for specific language and cultural
74conventions.
75This is a specialization process, by which generic methods already
76implemented in an internationalized system are used in specific ways.
77The formal description of cultural conventions for some country, together
78with all associated translations targeted to the native language, is
79called the
80.Dq locale .
81.Pp
82.Dx
83provides extensive support to programmers and system developers to
84enable internationalized software to be developed.
85.Dx
86also supplies a large variety of locales for system localization.
87.Ss Localization of Information
88All locale information is accessible to programs at run time so that
89data is processed and displayed correctly for specific cultural
90conventions and language.
91.Pp
92A locale is divided into categories.
93A category is a group of language-specific and culture-specific conventions
94as outlined in the list above.
95ISO C specifies the following six standard categories supported by
96.Dx :
97.Pp
98.Bl -tag -compact -width ".Ev LC_MONETARY"
99.It Ev LC_COLLATE
100string-collation order information
101.It Ev LC_CTYPE
102character classification, case conversion, and other character attributes
103.It Ev LC_MESSAGES
104the format for affirmative and negative responses
105.It Ev LC_MONETARY
106rules and symbols for formatting monetary numeric information
107.It Ev LC_NUMERIC
108rules and symbols for formatting nonmonetary numeric information
109.It Ev LC_TIME
110rules and symbols for formatting time and date information
111.El
112.Pp
113Localization of the system is achieved by setting appropriate values
114in environment variables to identify which locale should be used.
115The environment variables have the same names as their respective
116locale categories.
117Additionally, the
118.Ev LANG ,
119.Ev LC_ALL ,
120and
121.Ev NLSPATH
122environment variables are used.
123The
124.Ev NLSPATH
125environment variable specifies a colon-separated list of directory names
126where the message catalog files of the NLS database are located.
127The
128.Ev LC_ALL
129and
130.Ev LANG
131environment variables also determine the current locale.
132.Pp
133The values of these environment variables contains a string format as:
134.Bd -literal
135	language[_territory][.codeset][@modifier]
136.Ed
137.Pp
138Valid values for the language field come from the ISO639 standard which
139defines two-character codes for many languages.
140Some common language codes are:
141.Bl -column "PERSIAN (farsi)" "Sy Code" "OCEANIC/INDONESIAN"
142.It Sy Language Name Ta Sy Code Ta Sy Language Family
143.It ABKHAZIAN    Ta AB   Ta IBERO-CAUCASIAN
144.It AFAN (OROMO) Ta OM   Ta HAMITIC
145.It AFAR         Ta AA   Ta HAMITIC
146.It AFRIKAANS    Ta AF   Ta GERMANIC
147.It ALBANIAN     Ta SQ   Ta INDO-EUROPEAN (OTHER)
148.It AMHARIC      Ta AM   Ta SEMITIC
149.It ARABIC       Ta AR   Ta SEMITIC
150.It ARMENIAN     Ta HY   Ta INDO-EUROPEAN (OTHER)
151.It ASSAMESE     Ta AS   Ta INDIAN
152.It AYMARA       Ta AY   Ta AMERINDIAN
153.It AZERBAIJANI  Ta AZ   Ta TURKIC/ALTAIC
154.It BASHKIR      Ta BA   Ta TURKIC/ALTAIC
155.It BASQUE       Ta EU   Ta BASQUE
156.It BENGALI      Ta BN   Ta INDIAN
157.It BHUTANI      Ta DZ   Ta ASIAN
158.It BIHARI       Ta BH   Ta INDIAN
159.It BISLAMA      Ta BI   Ta ""
160.It BRETON       Ta BR   Ta CELTIC
161.It BULGARIAN    Ta BG   Ta SLAVIC
162.It BURMESE      Ta MY   Ta ASIAN
163.It BYELORUSSIAN Ta BE   Ta SLAVIC
164.It CAMBODIAN    Ta KM   Ta ASIAN
165.It CATALAN      Ta CA   Ta ROMANCE
166.It CHINESE      Ta ZH   Ta ASIAN
167.It CORSICAN     Ta CO   Ta ROMANCE
168.It CROATIAN     Ta HR   Ta SLAVIC
169.It CZECH        Ta CS   Ta SLAVIC
170.It DANISH       Ta DA   Ta GERMANIC
171.It DUTCH        Ta NL   Ta GERMANIC
172.It ENGLISH      Ta EN   Ta GERMANIC
173.It ESPERANTO    Ta EO   Ta INTERNATIONAL AUX.
174.It ESTONIAN     Ta ET   Ta FINNO-UGRIC
175.It FAROESE      Ta FO   Ta GERMANIC
176.It FIJI         Ta FJ   Ta OCEANIC/INDONESIAN
177.It FINNISH      Ta FI   Ta FINNO-UGRIC
178.It FRENCH       Ta FR   Ta ROMANCE
179.It FRISIAN      Ta FY   Ta GERMANIC
180.It GALICIAN     Ta GL   Ta ROMANCE
181.It GEORGIAN     Ta KA   Ta IBERO-CAUCASIAN
182.It GERMAN       Ta DE   Ta GERMANIC
183.It GREEK        Ta EL   Ta LATIN/GREEK
184.It GREENLANDIC  Ta KL   Ta ESKIMO
185.It GUARANI      Ta GN   Ta AMERINDIAN
186.It GUJARATI     Ta GU   Ta INDIAN
187.It HAUSA        Ta HA   Ta NEGRO-AFRICAN
188.It HEBREW       Ta HE   Ta SEMITIC
189.It HINDI        Ta HI   Ta INDIAN
190.It HUNGARIAN    Ta HU   Ta FINNO-UGRIC
191.It ICELANDIC    Ta IS   Ta GERMANIC
192.It INDONESIAN   Ta ID   Ta OCEANIC/INDONESIAN
193.It INTERLINGUA  Ta IA   Ta INTERNATIONAL AUX.
194.It INTERLINGUE  Ta IE   Ta INTERNATIONAL AUX.
195.It INUKTITUT    Ta IU   Ta ""
196.It INUPIAK      Ta IK   Ta ESKIMO
197.It IRISH        Ta GA   Ta CELTIC
198.It ITALIAN      Ta IT   Ta ROMANCE
199.It JAPANESE     Ta JA   Ta ASIAN
200.It JAVANESE     Ta JV   Ta OCEANIC/INDONESIAN
201.It KANNADA      Ta KN   Ta DRAVIDIAN
202.It KASHMIRI     Ta KS   Ta INDIAN
203.It KAZAKH       Ta KK   Ta TURKIC/ALTAIC
204.It KINYARWANDA  Ta RW   Ta NEGRO-AFRICAN
205.It KIRGHIZ      Ta KY   Ta TURKIC/ALTAIC
206.It KURUNDI      Ta RN   Ta NEGRO-AFRICAN
207.It KOREAN       Ta KO   Ta ASIAN
208.It KURDISH      Ta KU   Ta IRANIAN
209.It LAOTHIAN     Ta LO   Ta ASIAN
210.It LATIN        Ta LA   Ta LATIN/GREEK
211.It LATVIAN      Ta LV   Ta BALTIC
212.It LINGALA      Ta LN   Ta NEGRO-AFRICAN
213.It LITHUANIAN   Ta LT   Ta BALTIC
214.It MACEDONIAN   Ta MK   Ta SLAVIC
215.It MALAGASY     Ta MG   Ta OCEANIC/INDONESIAN
216.It MALAY        Ta MS   Ta OCEANIC/INDONESIAN
217.It MALAYALAM    Ta ML   Ta DRAVIDIAN
218.It MALTESE      Ta MT   Ta SEMITIC
219.It MAORI        Ta MI   Ta OCEANIC/INDONESIAN
220.It MARATHI      Ta MR   Ta INDIAN
221.It MOLDAVIAN    Ta MO   Ta ROMANCE
222.It MONGOLIAN    Ta MN   Ta ""
223.It NAURU        Ta NA   Ta ""
224.It NEPALI       Ta NE   Ta INDIAN
225.It NORWEGIAN    Ta NO   Ta GERMANIC
226.It OCCITAN      Ta OC   Ta ROMANCE
227.It ORIYA        Ta OR   Ta INDIAN
228.It PASHTO       Ta PS   Ta IRANIAN
229.It PERSIAN (farsi) Ta FA   Ta IRANIAN
230.It POLISH       Ta PL   Ta SLAVIC
231.It PORTUGUESE   Ta PT   Ta ROMANCE
232.It PUNJABI      Ta PA   Ta INDIAN
233.It QUECHUA      Ta QU   Ta AMERINDIAN
234.It RHAETO-ROMANCE Ta RM   Ta ROMANCE
235.It ROMANIAN     Ta RO   Ta ROMANCE
236.It RUSSIAN      Ta RU   Ta SLAVIC
237.It SAMOAN       Ta SM   Ta OCEANIC/INDONESIAN
238.It SANGHO       Ta SG   Ta NEGRO-AFRICAN
239.It SANSKRIT     Ta SA   Ta INDIAN
240.It SCOTS GAELIC Ta GD   Ta CELTIC
241.It SERBIAN      Ta SR   Ta SLAVIC
242.It SERBO-CROATIAN Ta SH   Ta SLAVIC
243.It SESOTHO      Ta ST   Ta NEGRO-AFRICAN
244.It SETSWANA     Ta TN   Ta NEGRO-AFRICAN
245.It SHONA        Ta SN   Ta NEGRO-AFRICAN
246.It SINDHI       Ta SD   Ta INDIAN
247.It SINGHALESE   Ta SI   Ta INDIAN
248.It SISWATI      Ta SS   Ta NEGRO-AFRICAN
249.It SLOVAK       Ta SK   Ta SLAVIC
250.It SLOVENIAN    Ta SL   Ta SLAVIC
251.It SOMALI       Ta SO   Ta HAMITIC
252.It SPANISH      Ta ES   Ta ROMANCE
253.It SUNDANESE    Ta SU   Ta OCEANIC/INDONESIAN
254.It SWAHILI      Ta SW   Ta NEGRO-AFRICAN
255.It SWEDISH      Ta SV   Ta GERMANIC
256.It TAGALOG      Ta TL   Ta OCEANIC/INDONESIAN
257.It TAJIK        Ta TG   Ta IRANIAN
258.It TAMIL        Ta TA   Ta DRAVIDIAN
259.It TATAR        Ta TT   Ta TURKIC/ALTAIC
260.It TELUGU       Ta TE   Ta DRAVIDIAN
261.It THAI         Ta TH   Ta ASIAN
262.It TIBETAN      Ta BO   Ta ASIAN
263.It TIGRINYA     Ta TI   Ta SEMITIC
264.It TONGA        Ta TO   Ta OCEANIC/INDONESIAN
265.It TSONGA       Ta TS   Ta NEGRO-AFRICAN
266.It TURKISH      Ta TR   Ta TURKIC/ALTAIC
267.It TURKMEN      Ta TK   Ta TURKIC/ALTAIC
268.It TWI          Ta TW   Ta NEGRO-AFRICAN
269.It UIGUR        Ta UG   Ta ""
270.It UKRAINIAN    Ta UK   Ta SLAVIC
271.It URDU         Ta UR   Ta INDIAN
272.It UZBEK        Ta UZ   Ta TURKIC/ALTAIC
273.It VIETNAMESE   Ta VI   Ta ASIAN
274.It VOLAPUK      Ta VO   Ta INTERNATIONAL AUX.
275.It WELSH        Ta CY   Ta CELTIC
276.It WOLOF        Ta WO   Ta NEGRO-AFRICAN
277.It XHOSA        Ta XH   Ta NEGRO-AFRICAN
278.It YIDDISH      Ta YI   Ta GERMANIC
279.It YORUBA       Ta YO   Ta NEGRO-AFRICAN
280.It ZHUANG       Ta ZA   Ta ""
281.It ZULU         Ta ZU   Ta NEGRO-AFRICAN
282.El
283.Pp
284For example, the locale for the Danish language spoken in Denmark
285using the ISO 8859-1 character set is da_DK.ISO8859-1.
286The da stands for the Danish language and the DK stands for Denmark.
287The short form of da_DK is sufficient to indicate this locale.
288.Pp
289The environment variable settings are queried by their priority level
290in the following manner:
291.Bl -bullet
292.It
293If the
294.Ev LC_ALL
295environment variable is set, all six categories use the locale it
296specifies.
297.It
298If the
299.Ev LC_ALL
300environment variable is not set, each individual category uses the
301locale specified by its corresponding environment variable.
302.It
303If the
304.Ev LC_ALL
305environment variable is not set, and a value for a particular
306.Ev LC_*
307environment variable is not set, the value of the
308.Ev LANG
309environment variable specifies the default locale for all categories.
310Only the
311.Ev LANG
312environment variable should be set in /etc/profile, since it makes it
313most easy for the user to override the system default using the individual
314.Ev LC_*
315variables.
316.It
317If the
318.Ev LC_ALL
319environment variable is not set, a value for a particular
320.Ev LC_*
321environment variable is not set, and the value of the
322.Ev LANG
323environment variable is not set, the locale for that specific
324category defaults to the C locale.
325The C or POSIX locale assumes the ASCII character set and defines
326information for the six categories.
327.El
328.Ss Character Sets
329A character is any symbol used for the organization, control, or
330representation of data.
331A group of such symbols used to describe a
332particular language make up a character set.
333It is the encoding values in a character set that provide
334the interface between the system and its input and output devices.
335.Pp
336The following character sets are supported in
337.Dx :
338.Bl -tag -width ISO_8859_family
339.It ASCII
340The American Standard Code for Information Exchange (ASCII) standard
341specifies 128 Roman characters and control codes, encoded in a 7-bit
342character encoding scheme.
343.It ISO 8859 family
344Industry-standard character sets specified by the ISO/IEC 8859
345standard.
346The standard is divided into 15 numbered parts, with each
347part specifying broad script similarities.
348Examples include Western European, Central European, Arabic, Cyrillic,
349Hebrew, Greek, and Turkish.
350The character sets use an 8-bit character encoding scheme which is
351compatible with the ASCII character set.
352.It Unicode
353The Unicode character set is the full set of known abstract characters of
354all real-world scripts.  It can be used in environments where multiple
355scripts must be processed simultaneously.
356Unicode is compatible with ISO 8859-1 (Western European) and ASCII.
357Many character encoding schemes are available for Unicode, including UTF-8,
358UTF-16 and UTF-32.
359These encoding schemes are multi-byte encodings.
360The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is
361compatible with ASCII.
362The UTF-16 encoding scheme uses 16-bit, variable-width encodings.
363The UTF-32 encoding scheme using 32-bit, fixed-width encodings.
364.El
365.Ss Font Sets
366A font set contains the glyphs to be displayed on the screen for a
367corresponding character in a character set.
368A display must support a suitable font to display a character set.
369If suitable fonts are available to the X server, then X clients can
370include support for different character sets.
371.Xr xterm 1
372includes support for Unicode with UTF-8 encoding.
373.Xr xfd 1
374is useful for displaying all the characters in an X font.
375.Pp
376The
377.Dx
378.Xr syscons 4
379console provides support for loading a variety of fonts using the
380.Xr vidcontrol 1
381utility. Available fonts can be found in
382.Pa /usr/share/syscons/fonts .
383.Ss Internationalization for Programmers
384To facilitate translations of messages into various languages and to
385make the translated messages available to the program based on a
386user's locale, it is necessary to keep messages separate from the
387programs and provide them in the form of message catalogs that a
388program can access at run time.
389.Pp
390Access to locale information is provided through the
391.Xr setlocale 3
392and
393.Xr nl_langinfo 3
394interfaces.
395See their respective man pages for further information.
396.Pp
397Message source files containing application messages are created by
398the programmer and converted to message catalogs.
399These catalogs are used by the application to retrieve and display
400messages, as needed.
401.Pp
402.Dx
403supports two message catalog interfaces: the X/Open
404.Xr catgets 3
405interface and the Uniforum
406.Xr gettext 3
407interface.
408The
409.Xr catgets 3
410interface has the advantage that it belongs to a standard which is
411well supported.
412Unfortunately the interface is complicated to use and
413maintenance of the catalogs is difficult.
414The implementation also doesn't support different character sets.
415The
416.Xr gettext 3
417interface has not been standardized yet, however it is being supported
418by an increasing number of systems.
419It also provides many additional tools which make programming and
420catalog maintenance much easier.
421.Ss Support for Multi-byte Encodings
422Some character sets with multi-byte encodings may be difficult to decode,
423or may contain state (i.e., adjacent characters are dependent).
424ISO C specifies a set of functions using 'wide characters' which can handle
425multi-byte encodings properly.
426The behaviour of these functions is affected
427by the
428.Ev LC_CTYPE
429category of the current locale.
430.Pp
431A wide character is specified in ISO C
432as being a fixed number of bits wide and is stateless.
433There are two types for wide characters:
434.Em wchar_t
435and
436.Em wint_t .
437.Em wchar_t
438is a type which can contain one wide character and operates like 'char'
439type does for one character.
440.Em wint_t
441can contain one wide character or WEOF (wide EOF).
442.Pp
443There are functions that operate on
444.Em wchar_t ,
445and substitute for functions operating on 'char'.
446See
447.Xr wmemchr 3
448and
449.Xr towlower 3
450for details.
451There are some additional functions that operate on
452.Em wchar_t .
453See
454.Xr wctype 3
455and
456.Xr wctrans 3
457for details.
458.Pp
459Wide characters should be used for all I/O processing which may rely
460on locale-specific strings.
461The two primary issues requiring special use of wide characters are:
462.Bl -bullet -offset indent
463.It
464All I/O is performed using multibyte characters.
465Input data is converted into wide characters immediately after
466reading and data for output is converted from wide characters to
467multi-byte encoding immediately before writing.
468Conversion is controlled by the
469.Xr mbstowcs 3 ,
470.Xr mbsrtowcs 3 ,
471.Xr wcstombs 3 ,
472.Xr wcsrtombs 3 ,
473.Xr mblen 3 ,
474.Xr mbrlen 3 ,
475and
476.Xr mbsinit 3 .
477.It
478Wide characters are used directly for I/O, using
479.Xr getwchar 3 ,
480.Xr fgetwc 3 ,
481.Xr getwc 3 ,
482.Xr ungetwc 3 ,
483.Xr fgetws 3 ,
484.Xr putwchar 3 ,
485.Xr fputwc 3 ,
486.Xr putwc 3 ,
487and
488.Xr fputws 3 .
489They are also used for formatted I/O functions for wide characters
490such as
491.Xr fwscanf 3 ,
492.Xr wscanf 3 ,
493.Xr swscanf 3 ,
494.Xr fwprintf 3 ,
495.Xr wprintf 3 ,
496.Xr swprintf 3 ,
497.Xr vfwprintf 3 ,
498.Xr vwprintf 3 ,
499and
500.Xr vswprintf 3 ,
501and wide character identifier of %lc, %C, %ls, %S for conventional
502formatted I/O functions.
503.El
504.Sh SEE ALSO
505.Xr gencat 1 ,
506.Xr vidcontrol 1 ,
507.Xr xfd 1 ,
508.Xr xterm 1 ,
509.Xr catgets 3 ,
510.Xr gettext 3 Pq Pa devel/gettext ,
511.Xr nl_langinfo 3 ,
512.Xr setlocale 3
513.Sh BUGS
514This man page is incomplete.
515