1.\" $NetBSD: nls.7,v 1.15 2009/04/09 02:51:54 joerg Exp $ 2.\" 3.\" Copyright (c) 2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Gregory McGarry. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGE. 29.\" 30.Dd November 24, 2013 31.Dt NLS 7 32.Os 33.Sh NAME 34.Nm NLS 35.Nd Native Language Support Overview 36.Sh DESCRIPTION 37Native Language Support (NLS) provides commands for a single 38worldwide operating system base. 39An internationalized system has no built-in assumptions or dependencies 40on language-specific or cultural-specific conventions such as: 41.Pp 42.Bl -bullet -offset indent -compact 43.It 44Character classifications 45.It 46Character comparison rules 47.It 48Character collation order 49.It 50Numeric and monetary formatting 51.It 52Date and time formatting 53.It 54Message-text language 55.It 56Character sets 57.El 58.Pp 59All information pertaining to cultural conventions and language is 60obtained at program run time. 61.Pp 62.Dq Internationalization 63(often abbreviated 64.Dq i18n ) 65refers to the operation by which system software is developed to support 66multiple cultural-specific and language-specific conventions. 67This is a generalization process by which the system is untied from 68calling only English strings or other English-specific conventions. 69.Dq Localization 70(often abbreviated 71.Dq l10n ) 72refers to the operations by which the user environment is customized to 73handle its input and output appropriate for specific language and cultural 74conventions. 75This is a specialization process, by which generic methods already 76implemented in an internationalized system are used in specific ways. 77The formal description of cultural conventions for some country, together 78with all associated translations targeted to the native language, is 79called the 80.Dq locale . 81.Pp 82.Dx 83provides extensive support to programmers and system developers to 84enable internationalized software to be developed. 85.Dx 86also supplies a large variety of locales for system localization. 87.Ss Localization of Information 88All locale information is accessible to programs at run time so that 89data is processed and displayed correctly for specific cultural 90conventions and language. 91.Pp 92A locale is divided into categories. 93A category is a group of language-specific and culture-specific conventions 94as outlined in the list above. 95ISO C specifies the following six standard categories supported by 96.Dx : 97.Pp 98.Bl -tag -compact -width ".Ev LC_MONETARY" 99.It Ev LC_COLLATE 100string-collation order information 101.It Ev LC_CTYPE 102character classification, case conversion, and other character attributes 103.It Ev LC_MESSAGES 104the format for affirmative and negative responses 105.It Ev LC_MONETARY 106rules and symbols for formatting monetary numeric information 107.It Ev LC_NUMERIC 108rules and symbols for formatting nonmonetary numeric information 109.It Ev LC_TIME 110rules and symbols for formatting time and date information 111.El 112.Pp 113Localization of the system is achieved by setting appropriate values 114in environment variables to identify which locale should be used. 115The environment variables have the same names as their respective 116locale categories. 117Additionally, the 118.Ev LANG , 119.Ev LC_ALL , 120and 121.Ev NLSPATH 122environment variables are used. 123The 124.Ev NLSPATH 125environment variable specifies a colon-separated list of directory names 126where the message catalog files of the NLS database are located. 127The 128.Ev LC_ALL 129and 130.Ev LANG 131environment variables also determine the current locale. 132.Pp 133The values of these environment variables contains a string format as: 134.Bd -literal 135 language[_territory][.codeset][@modifier] 136.Ed 137.Pp 138Valid values for the language field come from the ISO639 standard which 139defines two-character codes for many languages. 140Some common language codes are: 141.Bl -column "PERSIAN (farsi)" "Sy Code" "OCEANIC/INDONESIAN" 142.It Sy Language Name Ta Sy Code Ta Sy Language Family 143.It ABKHAZIAN Ta AB Ta IBERO-CAUCASIAN 144.It AFAN (OROMO) Ta OM Ta HAMITIC 145.It AFAR Ta AA Ta HAMITIC 146.It AFRIKAANS Ta AF Ta GERMANIC 147.It ALBANIAN Ta SQ Ta INDO-EUROPEAN (OTHER) 148.It AMHARIC Ta AM Ta SEMITIC 149.It ARABIC Ta AR Ta SEMITIC 150.It ARMENIAN Ta HY Ta INDO-EUROPEAN (OTHER) 151.It ASSAMESE Ta AS Ta INDIAN 152.It AYMARA Ta AY Ta AMERINDIAN 153.It AZERBAIJANI Ta AZ Ta TURKIC/ALTAIC 154.It BASHKIR Ta BA Ta TURKIC/ALTAIC 155.It BASQUE Ta EU Ta BASQUE 156.It BENGALI Ta BN Ta INDIAN 157.It BHUTANI Ta DZ Ta ASIAN 158.It BIHARI Ta BH Ta INDIAN 159.It BISLAMA Ta BI Ta "" 160.It BRETON Ta BR Ta CELTIC 161.It BULGARIAN Ta BG Ta SLAVIC 162.It BURMESE Ta MY Ta ASIAN 163.It BYELORUSSIAN Ta BE Ta SLAVIC 164.It CAMBODIAN Ta KM Ta ASIAN 165.It CATALAN Ta CA Ta ROMANCE 166.It CHINESE Ta ZH Ta ASIAN 167.It CORSICAN Ta CO Ta ROMANCE 168.It CROATIAN Ta HR Ta SLAVIC 169.It CZECH Ta CS Ta SLAVIC 170.It DANISH Ta DA Ta GERMANIC 171.It DUTCH Ta NL Ta GERMANIC 172.It ENGLISH Ta EN Ta GERMANIC 173.It ESPERANTO Ta EO Ta INTERNATIONAL AUX. 174.It ESTONIAN Ta ET Ta FINNO-UGRIC 175.It FAROESE Ta FO Ta GERMANIC 176.It FIJI Ta FJ Ta OCEANIC/INDONESIAN 177.It FINNISH Ta FI Ta FINNO-UGRIC 178.It FRENCH Ta FR Ta ROMANCE 179.It FRISIAN Ta FY Ta GERMANIC 180.It GALICIAN Ta GL Ta ROMANCE 181.It GEORGIAN Ta KA Ta IBERO-CAUCASIAN 182.It GERMAN Ta DE Ta GERMANIC 183.It GREEK Ta EL Ta LATIN/GREEK 184.It GREENLANDIC Ta KL Ta ESKIMO 185.It GUARANI Ta GN Ta AMERINDIAN 186.It GUJARATI Ta GU Ta INDIAN 187.It HAUSA Ta HA Ta NEGRO-AFRICAN 188.It HEBREW Ta HE Ta SEMITIC 189.It HINDI Ta HI Ta INDIAN 190.It HUNGARIAN Ta HU Ta FINNO-UGRIC 191.It ICELANDIC Ta IS Ta GERMANIC 192.It INDONESIAN Ta ID Ta OCEANIC/INDONESIAN 193.It INTERLINGUA Ta IA Ta INTERNATIONAL AUX. 194.It INTERLINGUE Ta IE Ta INTERNATIONAL AUX. 195.It INUKTITUT Ta IU Ta "" 196.It INUPIAK Ta IK Ta ESKIMO 197.It IRISH Ta GA Ta CELTIC 198.It ITALIAN Ta IT Ta ROMANCE 199.It JAPANESE Ta JA Ta ASIAN 200.It JAVANESE Ta JV Ta OCEANIC/INDONESIAN 201.It KANNADA Ta KN Ta DRAVIDIAN 202.It KASHMIRI Ta KS Ta INDIAN 203.It KAZAKH Ta KK Ta TURKIC/ALTAIC 204.It KINYARWANDA Ta RW Ta NEGRO-AFRICAN 205.It KIRGHIZ Ta KY Ta TURKIC/ALTAIC 206.It KURUNDI Ta RN Ta NEGRO-AFRICAN 207.It KOREAN Ta KO Ta ASIAN 208.It KURDISH Ta KU Ta IRANIAN 209.It LAOTHIAN Ta LO Ta ASIAN 210.It LATIN Ta LA Ta LATIN/GREEK 211.It LATVIAN Ta LV Ta BALTIC 212.It LINGALA Ta LN Ta NEGRO-AFRICAN 213.It LITHUANIAN Ta LT Ta BALTIC 214.It MACEDONIAN Ta MK Ta SLAVIC 215.It MALAGASY Ta MG Ta OCEANIC/INDONESIAN 216.It MALAY Ta MS Ta OCEANIC/INDONESIAN 217.It MALAYALAM Ta ML Ta DRAVIDIAN 218.It MALTESE Ta MT Ta SEMITIC 219.It MAORI Ta MI Ta OCEANIC/INDONESIAN 220.It MARATHI Ta MR Ta INDIAN 221.It MOLDAVIAN Ta MO Ta ROMANCE 222.It MONGOLIAN Ta MN Ta "" 223.It NAURU Ta NA Ta "" 224.It NEPALI Ta NE Ta INDIAN 225.It NORWEGIAN Ta NO Ta GERMANIC 226.It OCCITAN Ta OC Ta ROMANCE 227.It ORIYA Ta OR Ta INDIAN 228.It PASHTO Ta PS Ta IRANIAN 229.It PERSIAN (farsi) Ta FA Ta IRANIAN 230.It POLISH Ta PL Ta SLAVIC 231.It PORTUGUESE Ta PT Ta ROMANCE 232.It PUNJABI Ta PA Ta INDIAN 233.It QUECHUA Ta QU Ta AMERINDIAN 234.It RHAETO-ROMANCE Ta RM Ta ROMANCE 235.It ROMANIAN Ta RO Ta ROMANCE 236.It RUSSIAN Ta RU Ta SLAVIC 237.It SAMOAN Ta SM Ta OCEANIC/INDONESIAN 238.It SANGHO Ta SG Ta NEGRO-AFRICAN 239.It SANSKRIT Ta SA Ta INDIAN 240.It SCOTS GAELIC Ta GD Ta CELTIC 241.It SERBIAN Ta SR Ta SLAVIC 242.It SERBO-CROATIAN Ta SH Ta SLAVIC 243.It SESOTHO Ta ST Ta NEGRO-AFRICAN 244.It SETSWANA Ta TN Ta NEGRO-AFRICAN 245.It SHONA Ta SN Ta NEGRO-AFRICAN 246.It SINDHI Ta SD Ta INDIAN 247.It SINGHALESE Ta SI Ta INDIAN 248.It SISWATI Ta SS Ta NEGRO-AFRICAN 249.It SLOVAK Ta SK Ta SLAVIC 250.It SLOVENIAN Ta SL Ta SLAVIC 251.It SOMALI Ta SO Ta HAMITIC 252.It SPANISH Ta ES Ta ROMANCE 253.It SUNDANESE Ta SU Ta OCEANIC/INDONESIAN 254.It SWAHILI Ta SW Ta NEGRO-AFRICAN 255.It SWEDISH Ta SV Ta GERMANIC 256.It TAGALOG Ta TL Ta OCEANIC/INDONESIAN 257.It TAJIK Ta TG Ta IRANIAN 258.It TAMIL Ta TA Ta DRAVIDIAN 259.It TATAR Ta TT Ta TURKIC/ALTAIC 260.It TELUGU Ta TE Ta DRAVIDIAN 261.It THAI Ta TH Ta ASIAN 262.It TIBETAN Ta BO Ta ASIAN 263.It TIGRINYA Ta TI Ta SEMITIC 264.It TONGA Ta TO Ta OCEANIC/INDONESIAN 265.It TSONGA Ta TS Ta NEGRO-AFRICAN 266.It TURKISH Ta TR Ta TURKIC/ALTAIC 267.It TURKMEN Ta TK Ta TURKIC/ALTAIC 268.It TWI Ta TW Ta NEGRO-AFRICAN 269.It UIGUR Ta UG Ta "" 270.It UKRAINIAN Ta UK Ta SLAVIC 271.It URDU Ta UR Ta INDIAN 272.It UZBEK Ta UZ Ta TURKIC/ALTAIC 273.It VIETNAMESE Ta VI Ta ASIAN 274.It VOLAPUK Ta VO Ta INTERNATIONAL AUX. 275.It WELSH Ta CY Ta CELTIC 276.It WOLOF Ta WO Ta NEGRO-AFRICAN 277.It XHOSA Ta XH Ta NEGRO-AFRICAN 278.It YIDDISH Ta YI Ta GERMANIC 279.It YORUBA Ta YO Ta NEGRO-AFRICAN 280.It ZHUANG Ta ZA Ta "" 281.It ZULU Ta ZU Ta NEGRO-AFRICAN 282.El 283.Pp 284For example, the locale for the Danish language spoken in Denmark 285using the ISO 8859-1 character set is da_DK.ISO8859-1. 286The da stands for the Danish language and the DK stands for Denmark. 287The short form of da_DK is sufficient to indicate this locale. 288.Pp 289The environment variable settings are queried by their priority level 290in the following manner: 291.Bl -bullet 292.It 293If the 294.Ev LC_ALL 295environment variable is set, all six categories use the locale it 296specifies. 297.It 298If the 299.Ev LC_ALL 300environment variable is not set, each individual category uses the 301locale specified by its corresponding environment variable. 302.It 303If the 304.Ev LC_ALL 305environment variable is not set, and a value for a particular 306.Ev LC_* 307environment variable is not set, the value of the 308.Ev LANG 309environment variable specifies the default locale for all categories. 310Only the 311.Ev LANG 312environment variable should be set in /etc/profile, since it makes it 313most easy for the user to override the system default using the individual 314.Ev LC_* 315variables. 316.It 317If the 318.Ev LC_ALL 319environment variable is not set, a value for a particular 320.Ev LC_* 321environment variable is not set, and the value of the 322.Ev LANG 323environment variable is not set, the locale for that specific 324category defaults to the C locale. 325The C or POSIX locale assumes the ASCII character set and defines 326information for the six categories. 327.El 328.Ss Character Sets 329A character is any symbol used for the organization, control, or 330representation of data. 331A group of such symbols used to describe a 332particular language make up a character set. 333It is the encoding values in a character set that provide 334the interface between the system and its input and output devices. 335.Pp 336The following character sets are supported in 337.Dx : 338.Bl -tag -width ISO_8859_family 339.It ASCII 340The American Standard Code for Information Exchange (ASCII) standard 341specifies 128 Roman characters and control codes, encoded in a 7-bit 342character encoding scheme. 343.It ISO 8859 family 344Industry-standard character sets specified by the ISO/IEC 8859 345standard. 346The standard is divided into 15 numbered parts, with each 347part specifying broad script similarities. 348Examples include Western European, Central European, Arabic, Cyrillic, 349Hebrew, Greek, and Turkish. 350The character sets use an 8-bit character encoding scheme which is 351compatible with the ASCII character set. 352.It Unicode 353The Unicode character set is the full set of known abstract characters of 354all real-world scripts. It can be used in environments where multiple 355scripts must be processed simultaneously. 356Unicode is compatible with ISO 8859-1 (Western European) and ASCII. 357Many character encoding schemes are available for Unicode, including UTF-8, 358UTF-16 and UTF-32. 359These encoding schemes are multi-byte encodings. 360The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is 361compatible with ASCII. 362The UTF-16 encoding scheme uses 16-bit, variable-width encodings. 363The UTF-32 encoding scheme using 32-bit, fixed-width encodings. 364.El 365.Ss Font Sets 366A font set contains the glyphs to be displayed on the screen for a 367corresponding character in a character set. 368A display must support a suitable font to display a character set. 369If suitable fonts are available to the X server, then X clients can 370include support for different character sets. 371.Xr xterm 1 372includes support for Unicode with UTF-8 encoding. 373.Xr xfd 1 374is useful for displaying all the characters in an X font. 375.Pp 376The 377.Dx 378.Xr syscons 4 379console provides support for loading a variety of fonts using the 380.Xr vidcontrol 1 381utility. Available fonts can be found in 382.Pa /usr/share/syscons/fonts . 383.Ss Internationalization for Programmers 384To facilitate translations of messages into various languages and to 385make the translated messages available to the program based on a 386user's locale, it is necessary to keep messages separate from the 387programs and provide them in the form of message catalogs that a 388program can access at run time. 389.Pp 390Access to locale information is provided through the 391.Xr setlocale 3 392and 393.Xr nl_langinfo 3 394interfaces. 395See their respective man pages for further information. 396.Pp 397Message source files containing application messages are created by 398the programmer and converted to message catalogs. 399These catalogs are used by the application to retrieve and display 400messages, as needed. 401.Pp 402.Dx 403supports two message catalog interfaces: the X/Open 404.Xr catgets 3 405interface and the Uniforum 406.Xr gettext 3 407interface. 408The 409.Xr catgets 3 410interface has the advantage that it belongs to a standard which is 411well supported. 412Unfortunately the interface is complicated to use and 413maintenance of the catalogs is difficult. 414The implementation also doesn't support different character sets. 415The 416.Xr gettext 3 417interface has not been standardized yet, however it is being supported 418by an increasing number of systems. 419It also provides many additional tools which make programming and 420catalog maintenance much easier. 421.Ss Support for Multi-byte Encodings 422Some character sets with multi-byte encodings may be difficult to decode, 423or may contain state (i.e., adjacent characters are dependent). 424ISO C specifies a set of functions using 'wide characters' which can handle 425multi-byte encodings properly. 426The behaviour of these functions is affected 427by the 428.Ev LC_CTYPE 429category of the current locale. 430.Pp 431A wide character is specified in ISO C 432as being a fixed number of bits wide and is stateless. 433There are two types for wide characters: 434.Em wchar_t 435and 436.Em wint_t . 437.Em wchar_t 438is a type which can contain one wide character and operates like 'char' 439type does for one character. 440.Em wint_t 441can contain one wide character or WEOF (wide EOF). 442.Pp 443There are functions that operate on 444.Em wchar_t , 445and substitute for functions operating on 'char'. 446See 447.Xr wmemchr 3 448and 449.Xr towlower 3 450for details. 451There are some additional functions that operate on 452.Em wchar_t . 453See 454.Xr wctype 3 455and 456.Xr wctrans 3 457for details. 458.Pp 459Wide characters should be used for all I/O processing which may rely 460on locale-specific strings. 461The two primary issues requiring special use of wide characters are: 462.Bl -bullet -offset indent 463.It 464All I/O is performed using multibyte characters. 465Input data is converted into wide characters immediately after 466reading and data for output is converted from wide characters to 467multi-byte encoding immediately before writing. 468Conversion is controlled by the 469.Xr mbstowcs 3 , 470.Xr mbsrtowcs 3 , 471.Xr wcstombs 3 , 472.Xr wcsrtombs 3 , 473.Xr mblen 3 , 474.Xr mbrlen 3 , 475and 476.Xr mbsinit 3 . 477.It 478Wide characters are used directly for I/O, using 479.Xr getwchar 3 , 480.Xr fgetwc 3 , 481.Xr getwc 3 , 482.Xr ungetwc 3 , 483.Xr fgetws 3 , 484.Xr putwchar 3 , 485.Xr fputwc 3 , 486.Xr putwc 3 , 487and 488.Xr fputws 3 . 489They are also used for formatted I/O functions for wide characters 490such as 491.Xr fwscanf 3 , 492.Xr wscanf 3 , 493.Xr swscanf 3 , 494.Xr fwprintf 3 , 495.Xr wprintf 3 , 496.Xr swprintf 3 , 497.Xr vfwprintf 3 , 498.Xr vwprintf 3 , 499and 500.Xr vswprintf 3 , 501and wide character identifier of %lc, %C, %ls, %S for conventional 502formatted I/O functions. 503.El 504.Sh SEE ALSO 505.Xr gencat 1 , 506.Xr vidcontrol 1 , 507.Xr xfd 1 , 508.Xr xterm 1 , 509.Xr catgets 3 , 510.Xr gettext 3 Pq Pa devel/gettext , 511.Xr nl_langinfo 3 , 512.Xr setlocale 3 513.Sh BUGS 514This man page is incomplete. 515