1.\" $NetBSD: nls.7,v 1.15 2009/04/09 02:51:54 joerg Exp $ 2.\" 3.\" Copyright (c) 2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Gregory McGarry. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGE. 29.\" 30.Dd November 24, 2013 31.Dt NLS 7 32.Os 33.Sh NAME 34.Nm NLS 35.Nd Native Language Support Overview 36.Sh DESCRIPTION 37Native Language Support (NLS) provides commands for a single 38worldwide operating system base. 39An internationalized system has no built-in assumptions or dependencies 40on language-specific or cultural-specific conventions such as: 41.Pp 42.Bl -bullet -offset indent -compact 43.It 44Character classifications 45.It 46Character comparison rules 47.It 48Character collation order 49.It 50Numeric and monetary formatting 51.It 52Date and time formatting 53.It 54Message-text language 55.It 56Character sets 57.El 58.Pp 59All information pertaining to cultural conventions and language is 60obtained at program run time. 61.Pp 62.Dq Internationalization 63(often abbreviated 64.Dq i18n ) 65refers to the operation by which system software is developed to support 66multiple cultural-specific and language-specific conventions. 67This is a generalization process by which the system is untied from 68calling only English strings or other English-specific conventions. 69.Dq Localization 70(often abbreviated 71.Dq l10n ) 72refers to the operations by which the user environment is customized to 73handle its input and output appropriate for specific language and cultural 74conventions. 75This is a specialization process, by which generic methods already 76implemented in an internationalized system are used in specific ways. 77The formal description of cultural conventions for some country, together 78with all associated translations targeted to the native language, is 79called the 80.Dq locale . 81.Pp 82.Dx 83provides extensive support to programmers and system developers to 84enable internationalized software to be developed. 85.Dx 86also supplies a large variety of locales for system localization. 87.Ss Localization of Information 88All locale information is accessible to programs at run time so that 89data is processed and displayed correctly for specific cultural 90conventions and language. 91.Pp 92A locale is divided into categories. 93A category is a group of language-specific and culture-specific conventions 94as outlined in the list above. 95ISO C specifies the following six standard categories supported by 96.Dx : 97.Pp 98.Bl -tag -compact -width ".Ev LC_MONETARY" 99.It Ev LC_COLLATE 100string-collation order information 101.It Ev LC_CTYPE 102character classification, case conversion, and other character attributes 103.It Ev LC_MESSAGES 104the format for affirmative and negative responses 105.It Ev LC_MONETARY 106rules and symbols for formatting monetary numeric information 107.It Ev LC_NUMERIC 108rules and symbols for formatting nonmonetary numeric information 109.It Ev LC_TIME 110rules and symbols for formatting time and date information 111.El 112.Pp 113Localization of the system is achieved by setting appropriate values 114in environment variables to identify which locale should be used. 115The environment variables have the same names as their respective 116locale categories. 117Additionally, the 118.Ev LANG , 119.Ev LC_ALL , 120and 121.Ev NLSPATH 122environment variables are used. 123The 124.Ev NLSPATH 125environment variable specifies a colon-separated list of directory names 126where the message catalog files of the NLS database are located. 127The 128.Ev LC_ALL 129and 130.Ev LANG 131environment variables also determine the current locale. 132.Pp 133The values of these environment variables contains a string format as: 134.Bd -literal 135 language[_territory][.codeset][@modifier] 136.Ed 137.Pp 138Valid values for the language field come from the ISO639 standard which 139defines two-character codes for many languages. 140Some common language codes are: 141.Bl -column "PERSIAN (farsi)" "Sy Code" "OCEANIC/INDONESIAN" 142.It Sy Language Name Ta Sy Code Ta Sy Language Family 143.It ABKHAZIAN AB IBERO-CAUCASIAN 144.It AFAN (OROMO) OM HAMITIC 145.It AFAR AA HAMITIC 146.It AFRIKAANS AF GERMANIC 147.It ALBANIAN SQ INDO-EUROPEAN (OTHER) 148.It AMHARIC AM SEMITIC 149.It ARABIC AR SEMITIC 150.It ARMENIAN HY INDO-EUROPEAN (OTHER) 151.It ASSAMESE AS INDIAN 152.It AYMARA AY AMERINDIAN 153.It AZERBAIJANI AZ TURKIC/ALTAIC 154.It BASHKIR BA TURKIC/ALTAIC 155.It BASQUE EU BASQUE 156.It BENGALI BN INDIAN 157.It BHUTANI DZ ASIAN 158.It BIHARI BH INDIAN 159.It BISLAMA Ta BI Ta "" 160.It BRETON BR CELTIC 161.It BULGARIAN BG SLAVIC 162.It BURMESE MY ASIAN 163.It BYELORUSSIAN BE SLAVIC 164.It CAMBODIAN KM ASIAN 165.It CATALAN CA ROMANCE 166.It CHINESE ZH ASIAN 167.It CORSICAN CO ROMANCE 168.It CROATIAN HR SLAVIC 169.It CZECH CS SLAVIC 170.It DANISH DA GERMANIC 171.It DUTCH NL GERMANIC 172.It ENGLISH EN GERMANIC 173.It ESPERANTO EO INTERNATIONAL AUX. 174.It ESTONIAN ET FINNO-UGRIC 175.It FAROESE FO GERMANIC 176.It FIJI FJ OCEANIC/INDONESIAN 177.It FINNISH FI FINNO-UGRIC 178.It FRENCH FR ROMANCE 179.It FRISIAN FY GERMANIC 180.It GALICIAN GL ROMANCE 181.It GEORGIAN KA IBERO-CAUCASIAN 182.It GERMAN DE GERMANIC 183.It GREEK EL LATIN/GREEK 184.It GREENLANDIC KL ESKIMO 185.It GUARANI GN AMERINDIAN 186.It GUJARATI GU INDIAN 187.It HAUSA HA NEGRO-AFRICAN 188.It HEBREW HE SEMITIC 189.It HINDI HI INDIAN 190.It HUNGARIAN HU FINNO-UGRIC 191.It ICELANDIC IS GERMANIC 192.It INDONESIAN ID OCEANIC/INDONESIAN 193.It INTERLINGUA IA INTERNATIONAL AUX. 194.It INTERLINGUE IE INTERNATIONAL AUX. 195.It INUKTITUT Ta IU Ta "" 196.It INUPIAK IK ESKIMO 197.It IRISH GA CELTIC 198.It ITALIAN IT ROMANCE 199.It JAPANESE JA ASIAN 200.It JAVANESE JV OCEANIC/INDONESIAN 201.It KANNADA KN DRAVIDIAN 202.It KASHMIRI KS INDIAN 203.It KAZAKH KK TURKIC/ALTAIC 204.It KINYARWANDA RW NEGRO-AFRICAN 205.It KIRGHIZ KY TURKIC/ALTAIC 206.It KURUNDI RN NEGRO-AFRICAN 207.It KOREAN KO ASIAN 208.It KURDISH KU IRANIAN 209.It LAOTHIAN LO ASIAN 210.It LATIN LA LATIN/GREEK 211.It LATVIAN LV BALTIC 212.It LINGALA LN NEGRO-AFRICAN 213.It LITHUANIAN LT BALTIC 214.It MACEDONIAN MK SLAVIC 215.It MALAGASY MG OCEANIC/INDONESIAN 216.It MALAY MS OCEANIC/INDONESIAN 217.It MALAYALAM ML DRAVIDIAN 218.It MALTESE MT SEMITIC 219.It MAORI MI OCEANIC/INDONESIAN 220.It MARATHI MR INDIAN 221.It MOLDAVIAN MO ROMANCE 222.It MONGOLIAN Ta MN Ta "" 223.It NAURU Ta NA Ta "" 224.It NEPALI NE INDIAN 225.It NORWEGIAN NO GERMANIC 226.It OCCITAN OC ROMANCE 227.It ORIYA OR INDIAN 228.It PASHTO PS IRANIAN 229.It PERSIAN (farsi) FA IRANIAN 230.It POLISH PL SLAVIC 231.It PORTUGUESE PT ROMANCE 232.It PUNJABI PA INDIAN 233.It QUECHUA QU AMERINDIAN 234.It RHAETO-ROMANCE RM ROMANCE 235.It ROMANIAN RO ROMANCE 236.It RUSSIAN RU SLAVIC 237.It SAMOAN SM OCEANIC/INDONESIAN 238.It SANGHO SG NEGRO-AFRICAN 239.It SANSKRIT SA INDIAN 240.It SCOTS GAELIC GD CELTIC 241.It SERBIAN SR SLAVIC 242.It SERBO-CROATIAN SH SLAVIC 243.It SESOTHO ST NEGRO-AFRICAN 244.It SETSWANA TN NEGRO-AFRICAN 245.It SHONA SN NEGRO-AFRICAN 246.It SINDHI SD INDIAN 247.It SINGHALESE SI INDIAN 248.It SISWATI SS NEGRO-AFRICAN 249.It SLOVAK SK SLAVIC 250.It SLOVENIAN SL SLAVIC 251.It SOMALI SO HAMITIC 252.It SPANISH ES ROMANCE 253.It SUNDANESE SU OCEANIC/INDONESIAN 254.It SWAHILI SW NEGRO-AFRICAN 255.It SWEDISH SV GERMANIC 256.It TAGALOG TL OCEANIC/INDONESIAN 257.It TAJIK TG IRANIAN 258.It TAMIL TA DRAVIDIAN 259.It TATAR TT TURKIC/ALTAIC 260.It TELUGU TE DRAVIDIAN 261.It THAI TH ASIAN 262.It TIBETAN BO ASIAN 263.It TIGRINYA TI SEMITIC 264.It TONGA TO OCEANIC/INDONESIAN 265.It TSONGA TS NEGRO-AFRICAN 266.It TURKISH TR TURKIC/ALTAIC 267.It TURKMEN TK TURKIC/ALTAIC 268.It TWI TW NEGRO-AFRICAN 269.It UIGUR Ta UG Ta "" 270.It UKRAINIAN UK SLAVIC 271.It URDU UR INDIAN 272.It UZBEK UZ TURKIC/ALTAIC 273.It VIETNAMESE VI ASIAN 274.It VOLAPUK VO INTERNATIONAL AUX. 275.It WELSH CY CELTIC 276.It WOLOF WO NEGRO-AFRICAN 277.It XHOSA XH NEGRO-AFRICAN 278.It YIDDISH YI GERMANIC 279.It YORUBA YO NEGRO-AFRICAN 280.It ZHUANG Ta ZA Ta "" 281.It ZULU ZU NEGRO-AFRICAN 282.El 283.Pp 284For example, the locale for the Danish language spoken in Denmark 285using the ISO 8859-1 character set is da_DK.ISO8859-1. 286The da stands for the Danish language and the DK stands for Denmark. 287The short form of da_DK is sufficient to indicate this locale. 288.Pp 289The environment variable settings are queried by their priority level 290in the following manner: 291.Bl -bullet 292.It 293If the 294.Ev LC_ALL 295environment variable is set, all six categories use the locale it 296specifies. 297.It 298If the 299.Ev LC_ALL 300environment variable is not set, each individual category uses the 301locale specified by its corresponding environment variable. 302.It 303If the 304.Ev LC_ALL 305environment variable is not set, and a value for a particular 306.Ev LC_* 307environment variable is not set, the value of the 308.Ev LANG 309environment variable specifies the default locale for all categories. 310Only the 311.Ev LANG 312environment variable should be set in /etc/profile, since it makes it 313most easy for the user to override the system default using the individual 314.Ev LC_* 315variables. 316.It 317If the 318.Ev LC_ALL 319environment variable is not set, a value for a particular 320.Ev LC_* 321environment variable is not set, and the value of the 322.Ev LANG 323environment variable is not set, the locale for that specific 324category defaults to the C locale. 325The C or POSIX locale assumes the ASCII character set and defines 326information for the six categories. 327.El 328.Ss Character Sets 329A character is any symbol used for the organization, control, or 330representation of data. 331A group of such symbols used to describe a 332particular language make up a character set. 333It is the encoding values in a character set that provide 334the interface between the system and its input and output devices. 335.Pp 336The following character sets are supported in 337.Dx : 338.Bl -tag -width ISO_8859_family 339.It ASCII 340The American Standard Code for Information Exchange (ASCII) standard 341specifies 128 Roman characters and control codes, encoded in a 7-bit 342character encoding scheme. 343.It ISO 8859 family 344Industry-standard character sets specified by the ISO/IEC 8859 345standard. 346The standard is divided into 15 numbered parts, with each 347part specifying broad script similarities. 348Examples include Western European, Central European, Arabic, Cyrillic, 349Hebrew, Greek, and Turkish. 350The character sets use an 8-bit character encoding scheme which is 351compatible with the ASCII character set. 352.It Unicode 353The Unicode character set is the full set of known abstract characters of 354all real-world scripts. It can be used in environments where multiple 355scripts must be processed simultaneously. 356Unicode is compatible with ISO 8859-1 (Western European) and ASCII. 357Many character encoding schemes are available for Unicode, including UTF-8, 358UTF-16 and UTF-32. 359These encoding schemes are multi-byte encodings. 360The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is 361compatible with ASCII. 362The UTF-16 encoding scheme uses 16-bit, variable-width encodings. 363The UTF-32 encoding scheme using 32-bit, fixed-width encodings. 364.El 365.Ss Font Sets 366A font set contains the glyphs to be displayed on the screen for a 367corresponding character in a character set. 368A display must support a suitable font to display a character set. 369If suitable fonts are available to the X server, then X clients can 370include support for different character sets. 371.Xr xterm 1 372includes support for Unicode with UTF-8 encoding. 373.Xr xfd 1 374is useful for displaying all the characters in an X font. 375.Pp 376The 377.Dx 378.Xr syscons 4 379console provides support for loading a variety of fonts using the 380.Xr vidcontrol 1 381utility. Available fonts can be found in 382.Pa /usr/share/syscons/fonts . 383.Ss Internationalization for Programmers 384To facilitate translations of messages into various languages and to 385make the translated messages available to the program based on a 386user's locale, it is necessary to keep messages separate from the 387programs and provide them in the form of message catalogs that a 388program can access at run time. 389.Pp 390Access to locale information is provided through the 391.Xr setlocale 3 392and 393.Xr nl_langinfo 3 394interfaces. 395See their respective man pages for further information. 396.Pp 397Message source files containing application messages are created by 398the programmer and converted to message catalogs. 399These catalogs are used by the application to retrieve and display 400messages, as needed. 401.Pp 402.Dx 403supports two message catalog interfaces: the X/Open 404.Xr catgets 3 405interface and the Uniforum 406.Xr gettext 3 407interface. 408The 409.Xr catgets 3 410interface has the advantage that it belongs to a standard which is 411well supported. 412Unfortunately the interface is complicated to use and 413maintenance of the catalogs is difficult. 414The implementation also doesn't support different character sets. 415The 416.Xr gettext 3 417interface has not been standardized yet, however it is being supported 418by an increasing number of systems. 419It also provides many additional tools which make programming and 420catalog maintenance much easier. 421.Ss Support for Multi-byte Encodings 422Some character sets with multi-byte encodings may be difficult to decode, 423or may contain state (i.e., adjacent characters are dependent). 424ISO C specifies a set of functions using 'wide characters' which can handle 425multi-byte encodings properly. 426The behaviour of these functions is affected 427by the 428.Ev LC_CTYPE 429category of the current locale. 430.Pp 431A wide character is specified in ISO C 432as being a fixed number of bits wide and is stateless. 433There are two types for wide characters: 434.Em wchar_t 435and 436.Em wint_t . 437.Em wchar_t 438is a type which can contain one wide character and operates like 'char' 439type does for one character. 440.Em wint_t 441can contain one wide character or WEOF (wide EOF). 442.Pp 443There are functions that operate on 444.Em wchar_t , 445and substitute for functions operating on 'char'. 446See 447.Xr wmemchr 3 448and 449.Xr towlower 3 450for details. 451There are some additional functions that operate on 452.Em wchar_t . 453See 454.Xr wctype 3 455and 456.Xr wctrans 3 457for details. 458.Pp 459Wide characters should be used for all I/O processing which may rely 460on locale-specific strings. 461The two primary issues requiring special use of wide characters are: 462.Bl -bullet -offset indent 463.It 464All I/O is performed using multibyte characters. 465Input data is converted into wide characters immediately after 466reading and data for output is converted from wide characters to 467multi-byte encoding immediately before writing. 468Conversion is controlled by the 469.Xr mbstowcs 3 , 470.Xr mbsrtowcs 3 , 471.Xr wcstombs 3 , 472.Xr wcsrtombs 3 , 473.Xr mblen 3 , 474.Xr mbrlen 3 , 475and 476.Xr mbsinit 3 . 477.It 478Wide characters are used directly for I/O, using 479.Xr getwchar 3 , 480.Xr fgetwc 3 , 481.Xr getwc 3 , 482.Xr ungetwc 3 , 483.Xr fgetws 3 , 484.Xr putwchar 3 , 485.Xr fputwc 3 , 486.Xr putwc 3 , 487and 488.Xr fputws 3 . 489They are also used for formatted I/O functions for wide characters 490such as 491.Xr fwscanf 3 , 492.Xr wscanf 3 , 493.Xr swscanf 3 , 494.Xr fwprintf 3 , 495.Xr wprintf 3 , 496.Xr swprintf 3 , 497.Xr vfwprintf 3 , 498.Xr vwprintf 3 , 499and 500.Xr vswprintf 3 , 501and wide character identifier of %lc, %C, %ls, %S for conventional 502formatted I/O functions. 503.El 504.Sh SEE ALSO 505.Xr gencat 1 , 506.Xr vidcontrol 1 , 507.Xr xfd 1 , 508.Xr xterm 1 , 509.Xr catgets 3 , 510.Xr gettext 3 Pq Pa devel/gettext , 511.Xr nl_langinfo 3 , 512.Xr setlocale 3 513.Sh BUGS 514This man page is incomplete. 515