1.\" $NetBSD: nls.7,v 1.11 2003/06/26 11:55:56 wiz Exp $ 2.\" 3.\" Copyright (c) 2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Gregory McGarry. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by the NetBSD 20.\" Foundation, Inc. and its contributors. 21.\" 4. Neither the name of The NetBSD Foundation nor the names of its 22.\" contributors may be used to endorse or promote products derived 23.\" from this software without specific prior written permission. 24.\" 25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 28.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 35.\" POSSIBILITY OF SUCH DAMAGE. 36.\" 37.Dd May 17, 2003 38.Dt NLS 7 39.Os 40.Sh NAME 41.Nm NLS 42.Nd Native Language Support Overview 43.Sh DESCRIPTION 44Native Language Support (NLS) provides commands for a single 45worldwide operating system base. 46An internationalized system has no built-in assumptions or dependencies 47on language-specific or cultural-specific conventions such as: 48.Pp 49.Bl -bullet -offset indent -compact 50.It 51Character classifications 52.It 53Character comparison rules 54.It 55Character collation order 56.It 57Numeric and monetary formatting 58.It 59Date and time formatting 60.It 61Message-text language 62.It 63Character sets 64.El 65.Pp 66All information pertaining to cultural conventions and language is 67obtained at program run time. 68.Pp 69.Dq Internationalization 70(often abbreviated 71.Dq i18n ) 72refers to the operation by which system software is developed to support 73multiple cultural-specific and language-specific conventions. 74This is a generalization process by which the system is untied from 75calling only English strings or other English-specific conventions. 76.Dq Localization 77(often abbreviated 78.Dq l10n ) 79refers to the operations by which the user environment is customized to 80handle its input and output appropriate for specific language and cultural 81conventions. 82This is a specialization process, by which generic methods already 83implemented in an internationalized system are used in specific ways. 84The formal description of cultural conventions for some country, together 85with all associated translations targeted to the native language, is 86called the 87.Dq locale . 88.Pp 89.Dx 90provides extensive support to programmers and system developers to 91enable internationalized software to be developed. 92.Dx 93also supplies a large variety of locales for system localization. 94.Ss Localization of Information 95All locale information is accessible to programs at run time so that 96data is processed and displayed correctly for specific cultural 97conventions and language. 98.Pp 99A locale is divided into categories. 100A category is a group of language-specific and culture-specific conventions 101as outlined in the list above. 102ISO C specifies the following six standard categories supported by 103.Dx : 104.Pp 105.Bl -tag -compact -width LC_MONETARYXX 106.It LC_COLLATE 107string-collation order information 108.It LC_CTYPE 109character classification, case conversion, and other character attributes 110.It LC_MESSAGES 111the format for affirmative and negative responses 112.It LC_MONETARY 113rules and symbols for formatting monetary numeric information 114.It LC_NUMERIC 115rules and symbols for formatting nonmonetary numeric information 116.It LC_TIME 117rules and symbols for formatting time and date information 118.El 119.Pp 120Localization of the system is achieved by setting appropriate values 121in environment variables to identify which locale should be used. 122The environment variables have the same names as their respective 123locale categories. 124Additionally, the 125.Ev LANG , 126.Ev LC_ALL , 127and 128.Ev NLSPATH 129environment variables are used. 130The 131.Ev NLSPATH 132environment variable specifies a colon-separated list of directory names 133where the message catalog files of the NLS database are located. 134The 135.Ev LC_ALL 136and 137.Ev LANG 138environment variables also determine the current locale. 139.Pp 140The values of these environment variables contains a string format as: 141.Bd -literal 142 language[_territory][.codeset][@modifier] 143.Ed 144.Pp 145Valid values for the language field come from the ISO639 standard which 146defines two-character codes for many languages. 147Some common language codes are: 148.Pp 149.nf 150.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 151\fILanguage Name\fP \fICode\fP \fILanguage Family\fP 152.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 153.sp 5p 154ABKHAZIAN AB IBERO-CAUCASIAN 155AFAN (OROMO) OM HAMITIC 156AFAR AA HAMITIC 157AFRIKAANS AF GERMANIC 158ALBANIAN SQ INDO-EUROPEAN (OTHER) 159AMHARIC AM SEMITIC 160ARABIC AR SEMITIC 161ARMENIAN HY INDO-EUROPEAN (OTHER) 162ASSAMESE AS INDIAN 163AYMARA AY AMERINDIAN 164AZERBAIJANI AZ TURKIC/ALTAIC 165BASHKIR BA TURKIC/ALTAIC 166BASQUE EU BASQUE 167BENGALI BN INDIAN 168BHUTANI DZ ASIAN 169BIHARI BH INDIAN 170BISLAMA BI 171BRETON BR CELTIC 172BULGARIAN BG SLAVIC 173BURMESE MY ASIAN 174BYELORUSSIAN BE SLAVIC 175CAMBODIAN KM ASIAN 176CATALAN CA ROMANCE 177CHINESE ZH ASIAN 178CORSICAN CO ROMANCE 179CROATIAN HR SLAVIC 180CZECH CS SLAVIC 181DANISH DA GERMANIC 182DUTCH NL GERMANIC 183ENGLISH EN GERMANIC 184ESPERANTO EO INTERNATIONAL AUX. 185ESTONIAN ET FINNO-UGRIC 186FAROESE FO GERMANIC 187FIJI FJ OCEANIC/INDONESIAN 188FINNISH FI FINNO-UGRIC 189FRENCH FR ROMANCE 190FRISIAN FY GERMANIC 191GALICIAN GL ROMANCE 192GEORGIAN KA IBERO-CAUCASIAN 193GERMAN DE GERMANIC 194GREEK EL LATIN/GREEK 195GREENLANDIC KL ESKIMO 196GUARANI GN AMERINDIAN 197GUJARATI GU INDIAN 198HAUSA HA NEGRO-AFRICAN 199HEBREW HE SEMITIC 200HINDI HI INDIAN 201HUNGARIAN HU FINNO-UGRIC 202ICELANDIC IS GERMANIC 203INDONESIAN ID OCEANIC/INDONESIAN 204INTERLINGUA IA INTERNATIONAL AUX. 205INTERLINGUE IE INTERNATIONAL AUX. 206INUKTITUT IU 207INUPIAK IK ESKIMO 208IRISH GA CELTIC 209ITALIAN IT ROMANCE 210JAPANESE JA ASIAN 211JAVANESE JV OCEANIC/INDONESIAN 212KANNADA KN DRAVIDIAN 213KASHMIRI KS INDIAN 214KAZAKH KK TURKIC/ALTAIC 215KINYARWANDA RW NEGRO-AFRICAN 216KIRGHIZ KY TURKIC/ALTAIC 217KURUNDI RN NEGRO-AFRICAN 218KOREAN KO ASIAN 219KURDISH KU IRANIAN 220LAOTHIAN LO ASIAN 221LATIN LA LATIN/GREEK 222LATVIAN LV BALTIC 223LINGALA LN NEGRO-AFRICAN 224LITHUANIAN LT BALTIC 225MACEDONIAN MK SLAVIC 226MALAGASY MG OCEANIC/INDONESIAN 227MALAY MS OCEANIC/INDONESIAN 228MALAYALAM ML DRAVIDIAN 229MALTESE MT SEMITIC 230MAORI MI OCEANIC/INDONESIAN 231MARATHI MR INDIAN 232MOLDAVIAN MO ROMANCE 233MONGOLIAN MN 234NAURU NA 235NEPALI NE INDIAN 236NORWEGIAN NO GERMANIC 237OCCITAN OC ROMANCE 238ORIYA OR INDIAN 239PASHTO PS IRANIAN 240PERSIAN (farsi) FA IRANIAN 241POLISH PL SLAVIC 242PORTUGUESE PT ROMANCE 243PUNJABI PA INDIAN 244QUECHUA QU AMERINDIAN 245RHAETO-ROMANCE RM ROMANCE 246ROMANIAN RO ROMANCE 247RUSSIAN RU SLAVIC 248SAMOAN SM OCEANIC/INDONESIAN 249SANGHO SG NEGRO-AFRICAN 250SANSKRIT SA INDIAN 251SCOTS GAELIC GD CELTIC 252SERBIAN SR SLAVIC 253SERBO-CROATIAN SH SLAVIC 254SESOTHO ST NEGRO-AFRICAN 255SETSWANA TN NEGRO-AFRICAN 256SHONA SN NEGRO-AFRICAN 257SINDHI SD INDIAN 258SINGHALESE SI INDIAN 259SISWATI SS NEGRO-AFRICAN 260SLOVAK SK SLAVIC 261SLOVENIAN SL SLAVIC 262SOMALI SO HAMITIC 263SPANISH ES ROMANCE 264SUNDANESE SU OCEANIC/INDONESIAN 265SWAHILI SW NEGRO-AFRICAN 266SWEDISH SV GERMANIC 267TAGALOG TL OCEANIC/INDONESIAN 268TAJIK TG IRANIAN 269TAMIL TA DRAVIDIAN 270TATAR TT TURKIC/ALTAIC 271TELUGU TE DRAVIDIAN 272THAI TH ASIAN 273TIBETAN BO ASIAN 274TIGRINYA TI SEMITIC 275TONGA TO OCEANIC/INDONESIAN 276TSONGA TS NEGRO-AFRICAN 277TURKISH TR TURKIC/ALTAIC 278TURKMEN TK TURKIC/ALTAIC 279TWI TW NEGRO-AFRICAN 280UIGUR UG 281UKRAINIAN UK SLAVIC 282URDU UR INDIAN 283UZBEK UZ TURKIC/ALTAIC 284VIETNAMESE VI ASIAN 285VOLAPUK VO INTERNATIONAL AUX. 286WELSH CY CELTIC 287WOLOF WO NEGRO-AFRICAN 288XHOSA XH NEGRO-AFRICAN 289YIDDISH YI GERMANIC 290YORUBA YO NEGRO-AFRICAN 291ZHUANG ZA 292ZULU ZU NEGRO-AFRICAN 293.ta 294.fi 295.Pp 296For example, the locale for the Danish language spoken in Denmark 297using the ISO8859-1 character set is da_DK.ISO8859-1. 298The da stands for the Danish language and the DK stands for Denmark. 299The short form of da_DK is sufficient to indicate this locale. 300.Pp 301The environment variable settings are queried by their priority level 302in the following manner: 303.Bl -bullet 304.It 305If the 306.Ev LC_ALL 307environment variable is set, all six categories use the locale it 308specifies. 309.It 310If the 311.Ev LC_ALL 312environment variable is not set, each individual category uses the 313locale specified by its corresponding environment variable. 314.It 315If the 316.Ev LC_ALL 317environment variable is not set, and a value for a particular 318.Ev LC_* 319environment variable is not set, the value of the 320.Ev LANG 321environment variable specifies the default locale for all categories. 322Only the 323.Ev LANG 324environment variable should be set in /etc/profile, since it makes it 325most easy for the user to override the system default using the individual 326.Ev LC_* 327variables. 328.It 329If the 330.Ev LC_ALL 331environment variable is not set, a value for a particular 332.Ev LC_* 333environment variable is not set, and the value of the 334.Ev LANG 335environment variable is not set, the locale for that specific 336category defaults to the C locale. 337The C or POSIX locale assumes the 7-bit ASCII character set and defines 338information for the six categories. 339.El 340.Ss Character Sets 341A character is any symbol used for the organization, control, or 342representation of data. 343A group of such symbols used to describe a 344particular language make up a character set. 345It is the encoding values in a character set that provide 346the interface between the system and its input and output devices. 347.Pp 348The following character sets are supported in 349.Dx 350.Bl -tag -width ISO8859_family 351.It ISO8859 family 352Industry-standard character sets are provided by means of the ISO8859 353family of character sets, which provide a range of single-byte character set 354support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew, 355Greek, and Turkish. 356The eucJP character set is the industry-standard character set used to support 357the Japanese locale. 358.It Unicode 359A Unicode environment based on the UTF-8 character set is supported for all 360supported language/territories. 361UTF-8 provides character support for most of the major languages of the 362world and can be used in environments where multiple languages must be 363processed simultaneously. 364.El 365.Ss Font Sets 366A font set contains the glyphs to be displayed on the screen for a 367corresponding character in a character set. 368A display must support a suitable font to display a character set. 369If suitable fonts are available to the X server, then X clients can 370include support for different character sets. 371.Xr xterm 1 372includes support for UTF-8 character sets. 373.Xr xfd 1 374is useful for displaying all the characters in an X font. 375.Pp 376The 377.Dx 378.Xr syscons 4 379console provides support for loading a variety of fonts using the 380.Xr vidcontrol 1 381utility. Available fonts can be found in 382.Pa /usr/share/syscons/fonts . 383.Ss Internationalization for Programmers 384To facilitate translations of messages into various languages and to 385make the translated messages available to the program based on a 386user's locale, it is necessary to keep messages separate from the 387programs and provide them in the form of message catalogs that a 388program can access at run time. 389.Pp 390Access to locale information is provided through the 391.Xr setlocale 3 392and 393.Xr nl_langinfo 3 394interfaces. 395See their respective man pages for further information. 396.Pp 397Message source files containing application messages are created by 398the programmer and converted to message catalogs. 399These catalogs are used by the application to retrieve and display 400messages, as needed. 401.Pp 402.Dx 403supports two message catalog interfaces: the X/Open 404.Xr catgets 3 405interface and the Uniforum 406.Xr gettext 3 407interface. 408The 409.Xr catgets 3 410interface has the advantage that it belongs to a standard which is 411well supported. 412Unfortunately the interface is complicated to use and 413maintenance of the catalogs is difficult. 414The implementation also doesn't support different character sets. 415The 416.Xr gettext 3 417interface has not been standardized yet, however it is being supported 418by an increasing number of systems. 419It also provides many additional tools which make programming and 420catalog maintenance much easier. 421.Ss Support for Multibyte Characters and Wide Characters 422Character sets with multibyte characters may be difficult to decode, or may 423contain state (i.e., adjacent characters are dependent). 424ISO C specifies a set of functions using 'wide characters' which can handle 425multibyte characters properly. 426A wide character is specified in ISO C 427as being a fixed number of bits wide and is stateless. 428.Pp 429There are two types for wide characters: 430.Em wchar_t 431and 432.Em wint_t . 433.Em wchar_t 434is a type which can contain one wide character and operates like 'char' 435type does for one character. 436.Em wint_t 437can contain one wide character or WEOF (wide EOF). 438.Pp 439There are functions that operate on 440.Em wchar_t , 441and substitute for functions operating on 'char'. 442See 443.Xr wmemchr 3 444and 445.Xr towlower 3 446for details. 447There are some additional functions that operate on 448.Em wchar_t . 449See 450.Xr wctype 3 451and 452.Xr wctrans 3 453for details. 454.Pp 455Wide characters should be used for all I/O processing which may rely 456on locale-specific strings. 457The two primary issues requiring special use of wide characters are: 458.Bl -bullet -offset indent 459.It 460All I/O is performed using multibyte characters. 461Input data is converted into wide characters immediately after 462reading and data for output is converted from wide characters to 463multibyte characters immediately before writing. 464Conversion is achieved using 465.Xr mbstowcs 3 , 466.Xr mbsrtowcs 3 , 467.Xr wcstombs 3 , 468.Xr wcsrtombs 3 , 469.Xr mblen 3 , 470.Xr mbrlen 3 , 471and 472.Xr mbsinit 3 . 473.It 474Wide characters are used directly for I/O, using 475.Xr getwchar 3 , 476.Xr fgetwc 3 , 477.Xr getwc 3 , 478.Xr ungetwc 3 , 479.Xr fgetws 3 , 480.Xr putwchar 3 , 481.Xr fputwc 3 , 482.Xr putwc 3 , 483and 484.Xr fputws 3 . 485They are also used for formatted I/O functions for wide characters 486such as 487.Xr fwscanf 3 , 488.Xr wscanf 3 , 489.Xr swscanf 3 , 490.Xr fwprintf 3 , 491.Xr wprintf 3 , 492.Xr swprintf 3 , 493.Xr vfwprintf 3 , 494.Xr vwprintf 3 , 495and 496.Xr vswprintf 3 , 497and wide character identifier of %lc, %C, %ls, %S for conventional 498formatted I/O functions. 499.El 500.Sh SEE ALSO 501.Xr gencat 1 , 502.Xr vidcontrol 1 , 503.Xr xfd 1 , 504.Xr xterm 1 , 505.Xr catgets 3 , 506.Xr gettext 3 Pq Pa devel/gettext , 507.Xr nl_langinfo 3 , 508.Xr setlocale 3 509.Sh BUGS 510This man page is incomplete. 511