1.\" $NetBSD: nls.7,v 1.11 2003/06/26 11:55:56 wiz Exp $ 2.\" 3.\" Copyright (c) 2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Gregory McGarry. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by the NetBSD 20.\" Foundation, Inc. and its contributors. 21.\" 4. Neither the name of The NetBSD Foundation nor the names of its 22.\" contributors may be used to endorse or promote products derived 23.\" from this software without specific prior written permission. 24.\" 25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 28.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 35.\" POSSIBILITY OF SUCH DAMAGE. 36.\" 37.\" $DragonFly: src/share/man/man7/nls.7,v 1.5 2006/10/14 23:59:59 swildner Exp $ 38.\" 39.Dd May 17, 2003 40.Dt NLS 7 41.Os 42.Sh NAME 43.Nm NLS 44.Nd Native Language Support Overview 45.Sh DESCRIPTION 46Native Language Support (NLS) provides commands for a single 47worldwide operating system base. 48An internationalized system has no built-in assumptions or dependencies 49on language-specific or cultural-specific conventions such as: 50.Pp 51.Bl -bullet -offset indent -compact 52.It 53Character classifications 54.It 55Character comparison rules 56.It 57Character collation order 58.It 59Numeric and monetary formatting 60.It 61Date and time formatting 62.It 63Message-text language 64.It 65Character sets 66.El 67.Pp 68All information pertaining to cultural conventions and language is 69obtained at program run time. 70.Pp 71.Dq Internationalization 72(often abbreviated 73.Dq i18n ) 74refers to the operation by which system software is developed to support 75multiple cultural-specific and language-specific conventions. 76This is a generalization process by which the system is untied from 77calling only English strings or other English-specific conventions. 78.Dq Localization 79(often abbreviated 80.Dq l10n ) 81refers to the operations by which the user environment is customized to 82handle its input and output appropriate for specific language and cultural 83conventions. 84This is a specialization process, by which generic methods already 85implemented in an internationalized system are used in specific ways. 86The formal description of cultural conventions for some country, together 87with all associated translations targeted to the native language, is 88called the 89.Dq locale . 90.Pp 91.Dx 92provides extensive support to programmers and system developers to 93enable internationalized software to be developed. 94.Dx 95also supplies a large variety of locales for system localization. 96.Ss Localization of Information 97All locale information is accessible to programs at run time so that 98data is processed and displayed correctly for specific cultural 99conventions and language. 100.Pp 101A locale is divided into categories. 102A category is a group of language-specific and culture-specific conventions 103as outlined in the list above. 104ISO C specifies the following six standard categories supported by 105.Dx : 106.Pp 107.Bl -tag -compact -width LC_MONETARYXX 108.It LC_COLLATE 109string-collation order information 110.It LC_CTYPE 111character classification, case conversion, and other character attributes 112.It LC_MESSAGES 113the format for affirmative and negative responses 114.It LC_MONETARY 115rules and symbols for formatting monetary numeric information 116.It LC_NUMERIC 117rules and symbols for formatting nonmonetary numeric information 118.It LC_TIME 119rules and symbols for formatting time and date information 120.El 121.Pp 122Localization of the system is achieved by setting appropriate values 123in environment variables to identify which locale should be used. 124The environment variables have the same names as their respective 125locale categories. 126Additionally, the 127.Ev LANG , 128.Ev LC_ALL , 129and 130.Ev NLSPATH 131environment variables are used. 132The 133.Ev NLSPATH 134environment variable specifies a colon-separated list of directory names 135where the message catalog files of the NLS database are located. 136The 137.Ev LC_ALL 138and 139.Ev LANG 140environment variables also determine the current locale. 141.Pp 142The values of these environment variables contains a string format as: 143.Pp 144.Bd -literal 145 language[_territory][.codeset][@modifier] 146.Ed 147.Pp 148Valid values for the language field come from the ISO639 standard which 149defines two-character codes for many languages. 150Some common language codes are: 151.Pp 152.nf 153.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 154\fILanguage Name\fP \fICode\fP \fILanguage Family\fP 155.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 156.sp 5p 157ABKHAZIAN AB IBERO-CAUCASIAN 158AFAN (OROMO) OM HAMITIC 159AFAR AA HAMITIC 160AFRIKAANS AF GERMANIC 161ALBANIAN SQ INDO-EUROPEAN (OTHER) 162AMHARIC AM SEMITIC 163ARABIC AR SEMITIC 164ARMENIAN HY INDO-EUROPEAN (OTHER) 165ASSAMESE AS INDIAN 166AYMARA AY AMERINDIAN 167AZERBAIJANI AZ TURKIC/ALTAIC 168BASHKIR BA TURKIC/ALTAIC 169BASQUE EU BASQUE 170BENGALI BN INDIAN 171BHUTANI DZ ASIAN 172BIHARI BH INDIAN 173BISLAMA BI 174BRETON BR CELTIC 175BULGARIAN BG SLAVIC 176BURMESE MY ASIAN 177BYELORUSSIAN BE SLAVIC 178CAMBODIAN KM ASIAN 179CATALAN CA ROMANCE 180CHINESE ZH ASIAN 181CORSICAN CO ROMANCE 182CROATIAN HR SLAVIC 183CZECH CS SLAVIC 184DANISH DA GERMANIC 185DUTCH NL GERMANIC 186ENGLISH EN GERMANIC 187ESPERANTO EO INTERNATIONAL AUX. 188ESTONIAN ET FINNO-UGRIC 189FAROESE FO GERMANIC 190FIJI FJ OCEANIC/INDONESIAN 191FINNISH FI FINNO-UGRIC 192FRENCH FR ROMANCE 193FRISIAN FY GERMANIC 194GALICIAN GL ROMANCE 195GEORGIAN KA IBERO-CAUCASIAN 196GERMAN DE GERMANIC 197GREEK EL LATIN/GREEK 198GREENLANDIC KL ESKIMO 199GUARANI GN AMERINDIAN 200GUJARATI GU INDIAN 201HAUSA HA NEGRO-AFRICAN 202HEBREW HE SEMITIC 203HINDI HI INDIAN 204HUNGARIAN HU FINNO-UGRIC 205ICELANDIC IS GERMANIC 206INDONESIAN ID OCEANIC/INDONESIAN 207INTERLINGUA IA INTERNATIONAL AUX. 208INTERLINGUE IE INTERNATIONAL AUX. 209INUKTITUT IU 210INUPIAK IK ESKIMO 211IRISH GA CELTIC 212ITALIAN IT ROMANCE 213JAPANESE JA ASIAN 214JAVANESE JV OCEANIC/INDONESIAN 215KANNADA KN DRAVIDIAN 216KASHMIRI KS INDIAN 217KAZAKH KK TURKIC/ALTAIC 218KINYARWANDA RW NEGRO-AFRICAN 219KIRGHIZ KY TURKIC/ALTAIC 220KURUNDI RN NEGRO-AFRICAN 221KOREAN KO ASIAN 222KURDISH KU IRANIAN 223LAOTHIAN LO ASIAN 224LATIN LA LATIN/GREEK 225LATVIAN LV BALTIC 226LINGALA LN NEGRO-AFRICAN 227LITHUANIAN LT BALTIC 228MACEDONIAN MK SLAVIC 229MALAGASY MG OCEANIC/INDONESIAN 230MALAY MS OCEANIC/INDONESIAN 231MALAYALAM ML DRAVIDIAN 232MALTESE MT SEMITIC 233MAORI MI OCEANIC/INDONESIAN 234MARATHI MR INDIAN 235MOLDAVIAN MO ROMANCE 236MONGOLIAN MN 237NAURU NA 238NEPALI NE INDIAN 239NORWEGIAN NO GERMANIC 240OCCITAN OC ROMANCE 241ORIYA OR INDIAN 242PASHTO PS IRANIAN 243PERSIAN (farsi) FA IRANIAN 244POLISH PL SLAVIC 245PORTUGUESE PT ROMANCE 246PUNJABI PA INDIAN 247QUECHUA QU AMERINDIAN 248RHAETO-ROMANCE RM ROMANCE 249ROMANIAN RO ROMANCE 250RUSSIAN RU SLAVIC 251SAMOAN SM OCEANIC/INDONESIAN 252SANGHO SG NEGRO-AFRICAN 253SANSKRIT SA INDIAN 254SCOTS GAELIC GD CELTIC 255SERBIAN SR SLAVIC 256SERBO-CROATIAN SH SLAVIC 257SESOTHO ST NEGRO-AFRICAN 258SETSWANA TN NEGRO-AFRICAN 259SHONA SN NEGRO-AFRICAN 260SINDHI SD INDIAN 261SINGHALESE SI INDIAN 262SISWATI SS NEGRO-AFRICAN 263SLOVAK SK SLAVIC 264SLOVENIAN SL SLAVIC 265SOMALI SO HAMITIC 266SPANISH ES ROMANCE 267SUNDANESE SU OCEANIC/INDONESIAN 268SWAHILI SW NEGRO-AFRICAN 269SWEDISH SV GERMANIC 270TAGALOG TL OCEANIC/INDONESIAN 271TAJIK TG IRANIAN 272TAMIL TA DRAVIDIAN 273TATAR TT TURKIC/ALTAIC 274TELUGU TE DRAVIDIAN 275THAI TH ASIAN 276TIBETAN BO ASIAN 277TIGRINYA TI SEMITIC 278TONGA TO OCEANIC/INDONESIAN 279TSONGA TS NEGRO-AFRICAN 280TURKISH TR TURKIC/ALTAIC 281TURKMEN TK TURKIC/ALTAIC 282TWI TW NEGRO-AFRICAN 283UIGUR UG 284UKRAINIAN UK SLAVIC 285URDU UR INDIAN 286UZBEK UZ TURKIC/ALTAIC 287VIETNAMESE VI ASIAN 288VOLAPUK VO INTERNATIONAL AUX. 289WELSH CY CELTIC 290WOLOF WO NEGRO-AFRICAN 291XHOSA XH NEGRO-AFRICAN 292YIDDISH YI GERMANIC 293YORUBA YO NEGRO-AFRICAN 294ZHUANG ZA 295ZULU ZU NEGRO-AFRICAN 296.ta 297.fi 298.Pp 299For example, the locale for the Danish language spoken in Denmark 300using the ISO8859-1 character set is da_DK.ISO8859-1. 301The da stands for the Danish language and the DK stands for Denmark. 302The short form of da_DK is sufficient to indicate this locale. 303.Pp 304The environment variable settings are queried by their priority level 305in the following manner: 306.Pp 307.Bl -bullet 308.It 309If the 310.Ev LC_ALL 311environment variable is set, all six categories use the locale it 312specifies. 313.It 314If the 315.Ev LC_ALL 316environment variable is not set, each individual category uses the 317locale specified by its corresponding environment variable. 318.It 319If the 320.Ev LC_ALL 321environment variable is not set, and a value for a particular 322.Ev LC_* 323environment variable is not set, the value of the 324.Ev LANG 325environment variable specifies the default locale for all categories. 326Only the 327.Ev LANG 328environment variable should be set in /etc/profile, since it makes it 329most easy for the user to override the system default using the individual 330.Ev LC_* 331variables. 332.It 333If the 334.Ev LC_ALL 335environment variable is not set, a value for a particular 336.Ev LC_* 337environment variable is not set, and the value of the 338.Ev LANG 339environment variable is not set, the locale for that specific 340category defaults to the C locale. 341The C or POSIX locale assumes the 7-bit ASCII character set and defines 342information for the six categories. 343.El 344.Ss Character Sets 345A character is any symbol used for the organization, control, or 346representation of data. 347A group of such symbols used to describe a 348particular language make up a character set. 349It is the encoding values in a character set that provide 350the interface between the system and its input and output devices. 351.Pp 352The following character sets are supported in 353.Dx 354.Bl -tag -width ISO8859_family 355.It ISO8859 family 356Industry-standard character sets are provided by means of the ISO8859 357family of character sets, which provide a range of single-byte character set 358support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew, 359Greek, and Turkish. 360The eucJP character set is the industry-standard character set used to support 361the Japanese locale. 362.It Unicode 363A Unicode environment based on the UTF-8 character set is supported for all 364supported language/territories. 365UTF-8 provides character support for most of the major languages of the 366world and can be used in environments where multiple languages must be 367processed simultaneously. 368.El 369.Ss Font Sets 370A font set contains the glyphs to be displayed on the screen for a 371corresponding character in a character set. 372A display must support a suitable font to display a character set. 373If suitable fonts are available to the X server, then X clients can 374include support for different character sets. 375.Xr xterm 1 376includes support for UTF-8 character sets. 377.Xr xfd 1 378is useful for displaying all the characters in an X font. 379.Pp 380The 381.Dx 382.Xr syscons 4 383console provides support for loading a variety of fonts using the 384.Xr vidcontrol 1 385utility. Available fonts can be found in 386.Pa /usr/share/syscons/fonts . 387.Ss Internationalization for Programmers 388To facilitate translations of messages into various languages and to 389make the translated messages available to the program based on a 390user's locale, it is necessary to keep messages separate from the 391programs and provide them in the form of message catalogs that a 392program can access at run time. 393.Pp 394Access to locale information is provided through the 395.Xr setlocale 3 396and 397.Xr nl_langinfo 3 398interfaces. 399See their respective man pages for further information. 400.Pp 401Message source files containing application messages are created by 402the programmer and converted to message catalogs. 403These catalogs are used by the application to retrieve and display 404messages, as needed. 405.Pp 406.Dx 407supports two message catalog interfaces: the X/Open 408.Xr catgets 3 409interface and the Uniforum 410.Xr gettext 3 411interface. 412The 413.Xr catgets 3 414interface has the advantage that it belongs to a standard which is 415well supported. 416Unfortunately the interface is complicated to use and 417maintenance of the catalogs is difficult. 418The implementation also doesn't support different character sets. 419The 420.Xr gettext 3 421interface has not been standardized yet, however it is being supported 422by an increasing number of systems. 423It also provides many additional tools which make programming and 424catalog maintenance much easier. 425.Ss Support for Multibyte Characters and Wide Characters 426Character sets with multibyte characters may be difficult to decode, or may 427contain state (i.e., adjacent characters are dependent). 428ISO C specifies a set of functions using 'wide characters' which can handle 429multibyte characters properly. 430A wide character is specified in ISO C 431as being a fixed number of bits wide and is stateless. 432.Pp 433There are two types for wide characters: 434.Em wchar_t 435and 436.Em wint_t . 437.Em wchar_t 438is a type which can contain one wide character and operates like 'char' 439type does for one character. 440.Em wint_t 441can contain one wide character or WEOF (wide EOF). 442.Pp 443There are functions that operate on 444.Em wchar_t , 445and substitute for functions operating on 'char'. 446See 447.Xr wmemchr 3 448and 449.Xr towlower 3 450for details. 451There are some additional functions that operate on 452.Em wchar_t . 453See 454.Xr wctype 3 455and 456.Xr wctrans 3 457for details. 458.Pp 459Wide characters should be used for all I/O processing which may rely 460on locale-specific strings. 461The two primary issues requiring special use of wide characters are: 462.Bl -bullet -offset indent 463.It 464All I/O is performed using multibyte characters. 465Input data is converted into wide characters immediately after 466reading and data for output is converted from wide characters to 467multibyte characters immediately before writing. 468Conversion is achieved using 469.Xr mbstowcs 3 , 470.Xr mbsrtowcs 3 , 471.Xr wcstombs 3 , 472.Xr wcsrtombs 3 , 473.Xr mblen 3 , 474.Xr mbrlen 3 , 475and 476.Xr mbsinit 3 . 477.It 478Wide characters are used directly for I/O, using 479.Xr getwchar 3 , 480.Xr fgetwc 3 , 481.Xr getwc 3 , 482.Xr ungetwc 3 , 483.Xr fgetws 3 , 484.Xr putwchar 3 , 485.Xr fputwc 3 , 486.Xr putwc 3 , 487and 488.Xr fputws 3 . 489They are also used for formatted I/O functions for wide characters 490such as 491.Xr fwscanf 3 , 492.Xr wscanf 3 , 493.Xr swscanf 3 , 494.Xr fwprintf 3 , 495.Xr wprintf 3 , 496.Xr swprintf 3 , 497.Xr vfwprintf 3 , 498.Xr vwprintf 3 , 499and 500.Xr vswprintf 3 , 501and wide character identifier of %lc, %C, %ls, %S for conventional 502formatted I/O functions. 503.El 504.Sh SEE ALSO 505.Xr gencat 1 , 506.Xr vidcontrol 1 , 507.Xr xfd 1 , 508.Xr xterm 1 , 509.Xr catgets 3 , 510.Xr gettext 3 , 511.Xr nl_langinfo 3 , 512.Xr setlocale 3 513.Sh BUGS 514This man page is incomplete. 515