1.\" $NetBSD: nls.7,v 1.15 2009/04/09 02:51:54 joerg Exp $ 2.\" 3.\" Copyright (c) 2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Gregory McGarry. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 19.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 20.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 21.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 22.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 23.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 24.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 25.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 26.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGE. 29.\" 30.Dd February 21, 2007 31.Dt NLS 7 32.Os 33.Sh NAME 34.Nm NLS 35.Nd Native Language Support Overview 36.Sh DESCRIPTION 37Native Language Support (NLS) provides commands for a single 38worldwide operating system base. 39An internationalized system has no built-in assumptions or dependencies 40on language-specific or cultural-specific conventions such as: 41.Pp 42.Bl -bullet -offset indent -compact 43.It 44Character classifications 45.It 46Character comparison rules 47.It 48Character collation order 49.It 50Numeric and monetary formatting 51.It 52Date and time formatting 53.It 54Message-text language 55.It 56Character sets 57.El 58.Pp 59All information pertaining to cultural conventions and language is 60obtained at program run time. 61.Pp 62.Dq Internationalization 63(often abbreviated 64.Dq i18n ) 65refers to the operation by which system software is developed to support 66multiple cultural-specific and language-specific conventions. 67This is a generalization process by which the system is untied from 68calling only English strings or other English-specific conventions. 69.Dq Localization 70(often abbreviated 71.Dq l10n ) 72refers to the operations by which the user environment is customized to 73handle its input and output appropriate for specific language and cultural 74conventions. 75This is a specialization process, by which generic methods already 76implemented in an internationalized system are used in specific ways. 77The formal description of cultural conventions for some country, together 78with all associated translations targeted to the native language, is 79called the 80.Dq locale . 81.Pp 82.Nx 83provides extensive support to programmers and system developers to 84enable internationalized software to be developed. 85.Nx 86also supplies a large variety of locales for system localization. 87.Ss Localization of Information 88All locale information is accessible to programs at run time so that 89data is processed and displayed correctly for specific cultural 90conventions and language. 91.Pp 92A locale is divided into categories. 93A category is a group of language-specific and culture-specific conventions 94as outlined in the list above. 95ISO C specifies the following six standard categories supported by 96.Nx : 97.Pp 98.Bl -tag -compact -width LC_MONETARYXX 99.It Ev LC_COLLATE 100string-collation order information 101.It Ev LC_CTYPE 102character classification, case conversion, and other character attributes 103.It Ev LC_MESSAGES 104the format for affirmative and negative responses 105.It Ev LC_MONETARY 106rules and symbols for formatting monetary numeric information 107.It Ev LC_NUMERIC 108rules and symbols for formatting nonmonetary numeric information 109.It Ev LC_TIME 110rules and symbols for formatting time and date information 111.El 112.Pp 113Localization of the system is achieved by setting appropriate values 114in environment variables to identify which locale should be used. 115The environment variables have the same names as their respective 116locale categories. 117Additionally, the 118.Ev LANG , 119.Ev LC_ALL , 120and 121.Ev NLSPATH 122environment variables are used. 123The 124.Ev NLSPATH 125environment variable specifies a colon-separated list of directory names 126where the message catalog files of the NLS database are located. 127The 128.Ev LC_ALL 129and 130.Ev LANG 131environment variables also determine the current locale. 132.Pp 133The values of these environment variables contains a string format as: 134.Pp 135.Bd -literal 136 language[_territory][.codeset][@modifier] 137.Ed 138.Pp 139Valid values for the language field come from the ISO639 standard which 140defines two-character codes for many languages. 141Some common language codes are: 142.Pp 143.Bl -column "PERSIAN (farsi)" "Sy Code" "OCEANIC/INDONESIAN" 144.It Sy Language Name Ta Sy Code Ta Sy Language Family 145.It ABKHAZIAN AB IBERO-CAUCASIAN 146.It AFAN (OROMO) OM HAMITIC 147.It AFAR AA HAMITIC 148.It AFRIKAANS AF GERMANIC 149.It ALBANIAN SQ INDO-EUROPEAN (OTHER) 150.It AMHARIC AM SEMITIC 151.It ARABIC AR SEMITIC 152.It ARMENIAN HY INDO-EUROPEAN (OTHER) 153.It ASSAMESE AS INDIAN 154.It AYMARA AY AMERINDIAN 155.It AZERBAIJANI AZ TURKIC/ALTAIC 156.It BASHKIR BA TURKIC/ALTAIC 157.It BASQUE EU BASQUE 158.It BENGALI BN INDIAN 159.It BHUTANI DZ ASIAN 160.It BIHARI BH INDIAN 161.It BISLAMA Ta BI Ta "" 162.It BRETON BR CELTIC 163.It BULGARIAN BG SLAVIC 164.It BURMESE MY ASIAN 165.It BYELORUSSIAN BE SLAVIC 166.It CAMBODIAN KM ASIAN 167.It CATALAN CA ROMANCE 168.It CHINESE ZH ASIAN 169.It CORSICAN CO ROMANCE 170.It CROATIAN HR SLAVIC 171.It CZECH CS SLAVIC 172.It DANISH DA GERMANIC 173.It DUTCH NL GERMANIC 174.It ENGLISH EN GERMANIC 175.It ESPERANTO EO INTERNATIONAL AUX. 176.It ESTONIAN ET FINNO-UGRIC 177.It FAROESE FO GERMANIC 178.It FIJI FJ OCEANIC/INDONESIAN 179.It FINNISH FI FINNO-UGRIC 180.It FRENCH FR ROMANCE 181.It FRISIAN FY GERMANIC 182.It GALICIAN GL ROMANCE 183.It GEORGIAN KA IBERO-CAUCASIAN 184.It GERMAN DE GERMANIC 185.It GREEK EL LATIN/GREEK 186.It GREENLANDIC KL ESKIMO 187.It GUARANI GN AMERINDIAN 188.It GUJARATI GU INDIAN 189.It HAUSA HA NEGRO-AFRICAN 190.It HEBREW HE SEMITIC 191.It HINDI HI INDIAN 192.It HUNGARIAN HU FINNO-UGRIC 193.It ICELANDIC IS GERMANIC 194.It INDONESIAN ID OCEANIC/INDONESIAN 195.It INTERLINGUA IA INTERNATIONAL AUX. 196.It INTERLINGUE IE INTERNATIONAL AUX. 197.It INUKTITUT Ta IU Ta "" 198.It INUPIAK IK ESKIMO 199.It IRISH GA CELTIC 200.It ITALIAN IT ROMANCE 201.It JAPANESE JA ASIAN 202.It JAVANESE JV OCEANIC/INDONESIAN 203.It KANNADA KN DRAVIDIAN 204.It KASHMIRI KS INDIAN 205.It KAZAKH KK TURKIC/ALTAIC 206.It KINYARWANDA RW NEGRO-AFRICAN 207.It KIRGHIZ KY TURKIC/ALTAIC 208.It KURUNDI RN NEGRO-AFRICAN 209.It KOREAN KO ASIAN 210.It KURDISH KU IRANIAN 211.It LAOTHIAN LO ASIAN 212.It LATIN LA LATIN/GREEK 213.It LATVIAN LV BALTIC 214.It LINGALA LN NEGRO-AFRICAN 215.It LITHUANIAN LT BALTIC 216.It MACEDONIAN MK SLAVIC 217.It MALAGASY MG OCEANIC/INDONESIAN 218.It MALAY MS OCEANIC/INDONESIAN 219.It MALAYALAM ML DRAVIDIAN 220.It MALTESE MT SEMITIC 221.It MAORI MI OCEANIC/INDONESIAN 222.It MARATHI MR INDIAN 223.It MOLDAVIAN MO ROMANCE 224.It MONGOLIAN Ta MN Ta "" 225.It NAURU Ta NA Ta "" 226.It NEPALI NE INDIAN 227.It NORWEGIAN NO GERMANIC 228.It OCCITAN OC ROMANCE 229.It ORIYA OR INDIAN 230.It PASHTO PS IRANIAN 231.It PERSIAN (farsi) FA IRANIAN 232.It POLISH PL SLAVIC 233.It PORTUGUESE PT ROMANCE 234.It PUNJABI PA INDIAN 235.It QUECHUA QU AMERINDIAN 236.It RHAETO-ROMANCE RM ROMANCE 237.It ROMANIAN RO ROMANCE 238.It RUSSIAN RU SLAVIC 239.It SAMOAN SM OCEANIC/INDONESIAN 240.It SANGHO SG NEGRO-AFRICAN 241.It SANSKRIT SA INDIAN 242.It SCOTS GAELIC GD CELTIC 243.It SERBIAN SR SLAVIC 244.It SERBO-CROATIAN SH SLAVIC 245.It SESOTHO ST NEGRO-AFRICAN 246.It SETSWANA TN NEGRO-AFRICAN 247.It SHONA SN NEGRO-AFRICAN 248.It SINDHI SD INDIAN 249.It SINGHALESE SI INDIAN 250.It SISWATI SS NEGRO-AFRICAN 251.It SLOVAK SK SLAVIC 252.It SLOVENIAN SL SLAVIC 253.It SOMALI SO HAMITIC 254.It SPANISH ES ROMANCE 255.It SUNDANESE SU OCEANIC/INDONESIAN 256.It SWAHILI SW NEGRO-AFRICAN 257.It SWEDISH SV GERMANIC 258.It TAGALOG TL OCEANIC/INDONESIAN 259.It TAJIK TG IRANIAN 260.It TAMIL TA DRAVIDIAN 261.It TATAR TT TURKIC/ALTAIC 262.It TELUGU TE DRAVIDIAN 263.It THAI TH ASIAN 264.It TIBETAN BO ASIAN 265.It TIGRINYA TI SEMITIC 266.It TONGA TO OCEANIC/INDONESIAN 267.It TSONGA TS NEGRO-AFRICAN 268.It TURKISH TR TURKIC/ALTAIC 269.It TURKMEN TK TURKIC/ALTAIC 270.It TWI TW NEGRO-AFRICAN 271.It UIGUR Ta UG Ta "" 272.It UKRAINIAN UK SLAVIC 273.It URDU UR INDIAN 274.It UZBEK UZ TURKIC/ALTAIC 275.It VIETNAMESE VI ASIAN 276.It VOLAPUK VO INTERNATIONAL AUX. 277.It WELSH CY CELTIC 278.It WOLOF WO NEGRO-AFRICAN 279.It XHOSA XH NEGRO-AFRICAN 280.It YIDDISH YI GERMANIC 281.It YORUBA YO NEGRO-AFRICAN 282.It ZHUANG Ta ZA Ta "" 283.It ZULU ZU NEGRO-AFRICAN 284.El 285.Pp 286For example, the locale for the Danish language spoken in Denmark 287using the ISO 8859-1 character set is da_DK.ISO8859-1. 288The da stands for the Danish language and the DK stands for Denmark. 289The short form of da_DK is sufficient to indicate this locale. 290.Pp 291The environment variable settings are queried by their priority level 292in the following manner: 293.Pp 294.Bl -bullet 295.It 296If the 297.Ev LC_ALL 298environment variable is set, all six categories use the locale it 299specifies. 300.It 301If the 302.Ev LC_ALL 303environment variable is not set, each individual category uses the 304locale specified by its corresponding environment variable. 305.It 306If the 307.Ev LC_ALL 308environment variable is not set, and a value for a particular 309.Ev LC_* 310environment variable is not set, the value of the 311.Ev LANG 312environment variable specifies the default locale for all categories. 313Only the 314.Ev LANG 315environment variable should be set in /etc/profile, since it makes it 316most easy for the user to override the system default using the individual 317.Ev LC_* 318variables. 319.It 320If the 321.Ev LC_ALL 322environment variable is not set, a value for a particular 323.Ev LC_* 324environment variable is not set, and the value of the 325.Ev LANG 326environment variable is not set, the locale for that specific 327category defaults to the C locale. 328The C or POSIX locale assumes the ASCII character set and defines 329information for the six categories. 330.El 331.Ss Character Sets 332A character is any symbol used for the organization, control, or 333representation of data. 334A group of such symbols used to describe a 335particular language make up a character set. 336It is the encoding values in a character set that provide 337the interface between the system and its input and output devices. 338.Pp 339The following character sets are supported in 340.Nx : 341.Bl -tag -width ISO_8859_family 342.It ASCII 343The American Standard Code for Information Exchange (ASCII) standard 344specifies 128 Roman characters and control codes, encoded in a 7-bit 345character encoding scheme. 346.It ISO 8859 family 347Industry-standard character sets specified by the ISO/IEC 8859 348standard. 349The standard is divided into 15 numbered parts, with each 350part specifying broad script similarities. 351Examples include Western European, Central European, Arabic, Cyrillic, 352Hebrew, Greek, and Turkish. 353The character sets use an 8-bit character encoding scheme which is 354compatible with the ASCII character set. 355.It Unicode 356The Unicode character set is the full set of known abstract characters of 357all real-world scripts. It can be used in environments where multiple 358scripts must be processed simultaneously. 359Unicode is compatible with ISO 8859-1 (Western European) and ASCII. 360Many character encoding schemes are available for Unicode, including UTF-8, 361UTF-16 and UTF-32. 362These encoding schemes are multi-byte encodings. 363The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is 364compatible with ASCII. 365The UTF-16 encoding scheme uses 16-bit, variable-width encodings. 366The UTF-32 encoding scheme using 32-bit, fixed-width encodings. 367.El 368.Ss Font Sets 369A font set contains the glyphs to be displayed on the screen for a 370corresponding character in a character set. 371A display must support a suitable font to display a character set. 372If suitable fonts are available to the X server, then X clients can 373include support for different character sets. 374.Xr xterm 1 375includes support for Unicode with UTF-8 encoding. 376.Xr xfd 1 377is useful for displaying all the characters in an X font. 378.Pp 379The 380.Nx 381.Xr wscons 4 382console provides support for loading fonts using the 383.Xr wsfontload 8 384utility. 385Currently, only fonts for the ISO8859-1 family of character sets are 386supported. 387.Ss Internationalization for Programmers 388To facilitate translations of messages into various languages and to 389make the translated messages available to the program based on a 390user's locale, it is necessary to keep messages separate from the 391programs and provide them in the form of message catalogs that a 392program can access at run time. 393.Pp 394Access to locale information is provided through the 395.Xr setlocale 3 396and 397.Xr nl_langinfo 3 398interfaces. 399See their respective man pages for further information. 400.Pp 401Message source files containing application messages are created by 402the programmer and converted to message catalogs. 403These catalogs are used by the application to retrieve and display 404messages, as needed. 405.Pp 406.Nx 407supports two message catalog interfaces: the X/Open 408.Xr catgets 3 409interface and the Uniforum 410.Xr gettext 3 411interface. 412The 413.Xr catgets 3 414interface has the advantage that it belongs to a standard which is 415well supported. 416Unfortunately the interface is complicated to use and 417maintenance of the catalogs is difficult. 418The implementation also doesn't support different character sets. 419The 420.Xr gettext 3 421interface has not been standardized yet, however it is being supported 422by an increasing number of systems. 423It also provides many additional tools which make programming and 424catalog maintenance much easier. 425.Ss Support for Multi-byte Encodings 426Some character sets with multi-byte encodings may be difficult to decode, 427or may contain state (i.e., adjacent characters are dependent). 428ISO C specifies a set of functions using 'wide characters' which can handle 429multi-byte encodings properly. 430The behaviour of these functions is affected 431by the 432.Ev LC_CTYPE 433category of the current locale. 434.Pp 435A wide character is specified in ISO C 436as being a fixed number of bits wide and is stateless. 437There are two types for wide characters: 438.Em wchar_t 439and 440.Em wint_t . 441.Em wchar_t 442is a type which can contain one wide character and operates like 'char' 443type does for one character. 444.Em wint_t 445can contain one wide character or WEOF (wide EOF). 446.Pp 447There are functions that operate on 448.Em wchar_t , 449and substitute for functions operating on 'char'. 450See 451.Xr wmemchr 3 452and 453.Xr towlower 3 454for details. 455There are some additional functions that operate on 456.Em wchar_t . 457See 458.Xr wctype 3 459and 460.Xr wctrans 3 461for details. 462.Pp 463Wide characters should be used for all I/O processing which may rely 464on locale-specific strings. 465The two primary issues requiring special use of wide characters are: 466.Bl -bullet -offset indent 467.It 468All I/O is performed using multibyte characters. 469Input data is converted into wide characters immediately after 470reading and data for output is converted from wide characters to 471multi-byte encoding immediately before writing. 472Conversion is controlled by the 473.Xr mbstowcs 3 , 474.Xr mbsrtowcs 3 , 475.Xr wcstombs 3 , 476.Xr wcsrtombs 3 , 477.Xr mblen 3 , 478.Xr mbrlen 3 , 479and 480.Xr mbsinit 3 . 481.It 482Wide characters are used directly for I/O, using 483.Xr getwchar 3 , 484.Xr fgetwc 3 , 485.Xr getwc 3 , 486.Xr ungetwc 3 , 487.Xr fgetws 3 , 488.Xr putwchar 3 , 489.Xr fputwc 3 , 490.Xr putwc 3 , 491and 492.Xr fputws 3 . 493They are also used for formatted I/O functions for wide characters 494such as 495.Xr fwscanf 3 , 496.Xr wscanf 3 , 497.Xr swscanf 3 , 498.Xr fwprintf 3 , 499.Xr wprintf 3 , 500.Xr swprintf 3 , 501.Xr vfwprintf 3 , 502.Xr vwprintf 3 , 503and 504.Xr vswprintf 3 , 505and wide character identifier of %lc, %C, %ls, %S for conventional 506formatted I/O functions. 507.El 508.Sh SEE ALSO 509.Xr gencat 1 , 510.Xr xfd 1 , 511.Xr xterm 1 , 512.Xr catgets 3 , 513.Xr gettext 3 , 514.Xr nl_langinfo 3 , 515.Xr setlocale 3 , 516.Xr wsfontload 8 517.Sh BUGS 518This man page is incomplete. 519