1.\" $NetBSD: nls.7,v 1.11 2003/06/26 11:55:56 wiz Exp $ 2.\" 3.\" Copyright (c) 2003 The NetBSD Foundation, Inc. 4.\" All rights reserved. 5.\" 6.\" This code is derived from software contributed to The NetBSD Foundation 7.\" by Gregory McGarry. 8.\" 9.\" Redistribution and use in source and binary forms, with or without 10.\" modification, are permitted provided that the following conditions 11.\" are met: 12.\" 1. Redistributions of source code must retain the above copyright 13.\" notice, this list of conditions and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 3. All advertising materials mentioning features or use of this software 18.\" must display the following acknowledgement: 19.\" This product includes software developed by the NetBSD 20.\" Foundation, Inc. and its contributors. 21.\" 4. Neither the name of The NetBSD Foundation nor the names of its 22.\" contributors may be used to endorse or promote products derived 23.\" from this software without specific prior written permission. 24.\" 25.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS 26.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED 27.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 28.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS 29.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 30.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 31.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 32.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 33.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 34.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 35.\" POSSIBILITY OF SUCH DAMAGE. 36.\" 37.\" $DragonFly: src/share/man/man7/nls.7,v 1.7 2008/05/02 02:05:06 swildner Exp $ 38.\" 39.Dd May 17, 2003 40.Dt NLS 7 41.Os 42.Sh NAME 43.Nm NLS 44.Nd Native Language Support Overview 45.Sh DESCRIPTION 46Native Language Support (NLS) provides commands for a single 47worldwide operating system base. 48An internationalized system has no built-in assumptions or dependencies 49on language-specific or cultural-specific conventions such as: 50.Pp 51.Bl -bullet -offset indent -compact 52.It 53Character classifications 54.It 55Character comparison rules 56.It 57Character collation order 58.It 59Numeric and monetary formatting 60.It 61Date and time formatting 62.It 63Message-text language 64.It 65Character sets 66.El 67.Pp 68All information pertaining to cultural conventions and language is 69obtained at program run time. 70.Pp 71.Dq Internationalization 72(often abbreviated 73.Dq i18n ) 74refers to the operation by which system software is developed to support 75multiple cultural-specific and language-specific conventions. 76This is a generalization process by which the system is untied from 77calling only English strings or other English-specific conventions. 78.Dq Localization 79(often abbreviated 80.Dq l10n ) 81refers to the operations by which the user environment is customized to 82handle its input and output appropriate for specific language and cultural 83conventions. 84This is a specialization process, by which generic methods already 85implemented in an internationalized system are used in specific ways. 86The formal description of cultural conventions for some country, together 87with all associated translations targeted to the native language, is 88called the 89.Dq locale . 90.Pp 91.Dx 92provides extensive support to programmers and system developers to 93enable internationalized software to be developed. 94.Dx 95also supplies a large variety of locales for system localization. 96.Ss Localization of Information 97All locale information is accessible to programs at run time so that 98data is processed and displayed correctly for specific cultural 99conventions and language. 100.Pp 101A locale is divided into categories. 102A category is a group of language-specific and culture-specific conventions 103as outlined in the list above. 104ISO C specifies the following six standard categories supported by 105.Dx : 106.Pp 107.Bl -tag -compact -width LC_MONETARYXX 108.It LC_COLLATE 109string-collation order information 110.It LC_CTYPE 111character classification, case conversion, and other character attributes 112.It LC_MESSAGES 113the format for affirmative and negative responses 114.It LC_MONETARY 115rules and symbols for formatting monetary numeric information 116.It LC_NUMERIC 117rules and symbols for formatting nonmonetary numeric information 118.It LC_TIME 119rules and symbols for formatting time and date information 120.El 121.Pp 122Localization of the system is achieved by setting appropriate values 123in environment variables to identify which locale should be used. 124The environment variables have the same names as their respective 125locale categories. 126Additionally, the 127.Ev LANG , 128.Ev LC_ALL , 129and 130.Ev NLSPATH 131environment variables are used. 132The 133.Ev NLSPATH 134environment variable specifies a colon-separated list of directory names 135where the message catalog files of the NLS database are located. 136The 137.Ev LC_ALL 138and 139.Ev LANG 140environment variables also determine the current locale. 141.Pp 142The values of these environment variables contains a string format as: 143.Bd -literal 144 language[_territory][.codeset][@modifier] 145.Ed 146.Pp 147Valid values for the language field come from the ISO639 standard which 148defines two-character codes for many languages. 149Some common language codes are: 150.Pp 151.nf 152.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 153\fILanguage Name\fP \fICode\fP \fILanguage Family\fP 154.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC 155.sp 5p 156ABKHAZIAN AB IBERO-CAUCASIAN 157AFAN (OROMO) OM HAMITIC 158AFAR AA HAMITIC 159AFRIKAANS AF GERMANIC 160ALBANIAN SQ INDO-EUROPEAN (OTHER) 161AMHARIC AM SEMITIC 162ARABIC AR SEMITIC 163ARMENIAN HY INDO-EUROPEAN (OTHER) 164ASSAMESE AS INDIAN 165AYMARA AY AMERINDIAN 166AZERBAIJANI AZ TURKIC/ALTAIC 167BASHKIR BA TURKIC/ALTAIC 168BASQUE EU BASQUE 169BENGALI BN INDIAN 170BHUTANI DZ ASIAN 171BIHARI BH INDIAN 172BISLAMA BI 173BRETON BR CELTIC 174BULGARIAN BG SLAVIC 175BURMESE MY ASIAN 176BYELORUSSIAN BE SLAVIC 177CAMBODIAN KM ASIAN 178CATALAN CA ROMANCE 179CHINESE ZH ASIAN 180CORSICAN CO ROMANCE 181CROATIAN HR SLAVIC 182CZECH CS SLAVIC 183DANISH DA GERMANIC 184DUTCH NL GERMANIC 185ENGLISH EN GERMANIC 186ESPERANTO EO INTERNATIONAL AUX. 187ESTONIAN ET FINNO-UGRIC 188FAROESE FO GERMANIC 189FIJI FJ OCEANIC/INDONESIAN 190FINNISH FI FINNO-UGRIC 191FRENCH FR ROMANCE 192FRISIAN FY GERMANIC 193GALICIAN GL ROMANCE 194GEORGIAN KA IBERO-CAUCASIAN 195GERMAN DE GERMANIC 196GREEK EL LATIN/GREEK 197GREENLANDIC KL ESKIMO 198GUARANI GN AMERINDIAN 199GUJARATI GU INDIAN 200HAUSA HA NEGRO-AFRICAN 201HEBREW HE SEMITIC 202HINDI HI INDIAN 203HUNGARIAN HU FINNO-UGRIC 204ICELANDIC IS GERMANIC 205INDONESIAN ID OCEANIC/INDONESIAN 206INTERLINGUA IA INTERNATIONAL AUX. 207INTERLINGUE IE INTERNATIONAL AUX. 208INUKTITUT IU 209INUPIAK IK ESKIMO 210IRISH GA CELTIC 211ITALIAN IT ROMANCE 212JAPANESE JA ASIAN 213JAVANESE JV OCEANIC/INDONESIAN 214KANNADA KN DRAVIDIAN 215KASHMIRI KS INDIAN 216KAZAKH KK TURKIC/ALTAIC 217KINYARWANDA RW NEGRO-AFRICAN 218KIRGHIZ KY TURKIC/ALTAIC 219KURUNDI RN NEGRO-AFRICAN 220KOREAN KO ASIAN 221KURDISH KU IRANIAN 222LAOTHIAN LO ASIAN 223LATIN LA LATIN/GREEK 224LATVIAN LV BALTIC 225LINGALA LN NEGRO-AFRICAN 226LITHUANIAN LT BALTIC 227MACEDONIAN MK SLAVIC 228MALAGASY MG OCEANIC/INDONESIAN 229MALAY MS OCEANIC/INDONESIAN 230MALAYALAM ML DRAVIDIAN 231MALTESE MT SEMITIC 232MAORI MI OCEANIC/INDONESIAN 233MARATHI MR INDIAN 234MOLDAVIAN MO ROMANCE 235MONGOLIAN MN 236NAURU NA 237NEPALI NE INDIAN 238NORWEGIAN NO GERMANIC 239OCCITAN OC ROMANCE 240ORIYA OR INDIAN 241PASHTO PS IRANIAN 242PERSIAN (farsi) FA IRANIAN 243POLISH PL SLAVIC 244PORTUGUESE PT ROMANCE 245PUNJABI PA INDIAN 246QUECHUA QU AMERINDIAN 247RHAETO-ROMANCE RM ROMANCE 248ROMANIAN RO ROMANCE 249RUSSIAN RU SLAVIC 250SAMOAN SM OCEANIC/INDONESIAN 251SANGHO SG NEGRO-AFRICAN 252SANSKRIT SA INDIAN 253SCOTS GAELIC GD CELTIC 254SERBIAN SR SLAVIC 255SERBO-CROATIAN SH SLAVIC 256SESOTHO ST NEGRO-AFRICAN 257SETSWANA TN NEGRO-AFRICAN 258SHONA SN NEGRO-AFRICAN 259SINDHI SD INDIAN 260SINGHALESE SI INDIAN 261SISWATI SS NEGRO-AFRICAN 262SLOVAK SK SLAVIC 263SLOVENIAN SL SLAVIC 264SOMALI SO HAMITIC 265SPANISH ES ROMANCE 266SUNDANESE SU OCEANIC/INDONESIAN 267SWAHILI SW NEGRO-AFRICAN 268SWEDISH SV GERMANIC 269TAGALOG TL OCEANIC/INDONESIAN 270TAJIK TG IRANIAN 271TAMIL TA DRAVIDIAN 272TATAR TT TURKIC/ALTAIC 273TELUGU TE DRAVIDIAN 274THAI TH ASIAN 275TIBETAN BO ASIAN 276TIGRINYA TI SEMITIC 277TONGA TO OCEANIC/INDONESIAN 278TSONGA TS NEGRO-AFRICAN 279TURKISH TR TURKIC/ALTAIC 280TURKMEN TK TURKIC/ALTAIC 281TWI TW NEGRO-AFRICAN 282UIGUR UG 283UKRAINIAN UK SLAVIC 284URDU UR INDIAN 285UZBEK UZ TURKIC/ALTAIC 286VIETNAMESE VI ASIAN 287VOLAPUK VO INTERNATIONAL AUX. 288WELSH CY CELTIC 289WOLOF WO NEGRO-AFRICAN 290XHOSA XH NEGRO-AFRICAN 291YIDDISH YI GERMANIC 292YORUBA YO NEGRO-AFRICAN 293ZHUANG ZA 294ZULU ZU NEGRO-AFRICAN 295.ta 296.fi 297.Pp 298For example, the locale for the Danish language spoken in Denmark 299using the ISO8859-1 character set is da_DK.ISO8859-1. 300The da stands for the Danish language and the DK stands for Denmark. 301The short form of da_DK is sufficient to indicate this locale. 302.Pp 303The environment variable settings are queried by their priority level 304in the following manner: 305.Bl -bullet 306.It 307If the 308.Ev LC_ALL 309environment variable is set, all six categories use the locale it 310specifies. 311.It 312If the 313.Ev LC_ALL 314environment variable is not set, each individual category uses the 315locale specified by its corresponding environment variable. 316.It 317If the 318.Ev LC_ALL 319environment variable is not set, and a value for a particular 320.Ev LC_* 321environment variable is not set, the value of the 322.Ev LANG 323environment variable specifies the default locale for all categories. 324Only the 325.Ev LANG 326environment variable should be set in /etc/profile, since it makes it 327most easy for the user to override the system default using the individual 328.Ev LC_* 329variables. 330.It 331If the 332.Ev LC_ALL 333environment variable is not set, a value for a particular 334.Ev LC_* 335environment variable is not set, and the value of the 336.Ev LANG 337environment variable is not set, the locale for that specific 338category defaults to the C locale. 339The C or POSIX locale assumes the 7-bit ASCII character set and defines 340information for the six categories. 341.El 342.Ss Character Sets 343A character is any symbol used for the organization, control, or 344representation of data. 345A group of such symbols used to describe a 346particular language make up a character set. 347It is the encoding values in a character set that provide 348the interface between the system and its input and output devices. 349.Pp 350The following character sets are supported in 351.Dx 352.Bl -tag -width ISO8859_family 353.It ISO8859 family 354Industry-standard character sets are provided by means of the ISO8859 355family of character sets, which provide a range of single-byte character set 356support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew, 357Greek, and Turkish. 358The eucJP character set is the industry-standard character set used to support 359the Japanese locale. 360.It Unicode 361A Unicode environment based on the UTF-8 character set is supported for all 362supported language/territories. 363UTF-8 provides character support for most of the major languages of the 364world and can be used in environments where multiple languages must be 365processed simultaneously. 366.El 367.Ss Font Sets 368A font set contains the glyphs to be displayed on the screen for a 369corresponding character in a character set. 370A display must support a suitable font to display a character set. 371If suitable fonts are available to the X server, then X clients can 372include support for different character sets. 373.Xr xterm 1 374includes support for UTF-8 character sets. 375.Xr xfd 1 376is useful for displaying all the characters in an X font. 377.Pp 378The 379.Dx 380.Xr syscons 4 381console provides support for loading a variety of fonts using the 382.Xr vidcontrol 1 383utility. Available fonts can be found in 384.Pa /usr/share/syscons/fonts . 385.Ss Internationalization for Programmers 386To facilitate translations of messages into various languages and to 387make the translated messages available to the program based on a 388user's locale, it is necessary to keep messages separate from the 389programs and provide them in the form of message catalogs that a 390program can access at run time. 391.Pp 392Access to locale information is provided through the 393.Xr setlocale 3 394and 395.Xr nl_langinfo 3 396interfaces. 397See their respective man pages for further information. 398.Pp 399Message source files containing application messages are created by 400the programmer and converted to message catalogs. 401These catalogs are used by the application to retrieve and display 402messages, as needed. 403.Pp 404.Dx 405supports two message catalog interfaces: the X/Open 406.Xr catgets 3 407interface and the Uniforum 408.Xr gettext 3 409interface. 410The 411.Xr catgets 3 412interface has the advantage that it belongs to a standard which is 413well supported. 414Unfortunately the interface is complicated to use and 415maintenance of the catalogs is difficult. 416The implementation also doesn't support different character sets. 417The 418.Xr gettext 3 419interface has not been standardized yet, however it is being supported 420by an increasing number of systems. 421It also provides many additional tools which make programming and 422catalog maintenance much easier. 423.Ss Support for Multibyte Characters and Wide Characters 424Character sets with multibyte characters may be difficult to decode, or may 425contain state (i.e., adjacent characters are dependent). 426ISO C specifies a set of functions using 'wide characters' which can handle 427multibyte characters properly. 428A wide character is specified in ISO C 429as being a fixed number of bits wide and is stateless. 430.Pp 431There are two types for wide characters: 432.Em wchar_t 433and 434.Em wint_t . 435.Em wchar_t 436is a type which can contain one wide character and operates like 'char' 437type does for one character. 438.Em wint_t 439can contain one wide character or WEOF (wide EOF). 440.Pp 441There are functions that operate on 442.Em wchar_t , 443and substitute for functions operating on 'char'. 444See 445.Xr wmemchr 3 446and 447.Xr towlower 3 448for details. 449There are some additional functions that operate on 450.Em wchar_t . 451See 452.Xr wctype 3 453and 454.Xr wctrans 3 455for details. 456.Pp 457Wide characters should be used for all I/O processing which may rely 458on locale-specific strings. 459The two primary issues requiring special use of wide characters are: 460.Bl -bullet -offset indent 461.It 462All I/O is performed using multibyte characters. 463Input data is converted into wide characters immediately after 464reading and data for output is converted from wide characters to 465multibyte characters immediately before writing. 466Conversion is achieved using 467.Xr mbstowcs 3 , 468.Xr mbsrtowcs 3 , 469.Xr wcstombs 3 , 470.Xr wcsrtombs 3 , 471.Xr mblen 3 , 472.Xr mbrlen 3 , 473and 474.Xr mbsinit 3 . 475.It 476Wide characters are used directly for I/O, using 477.Xr getwchar 3 , 478.Xr fgetwc 3 , 479.Xr getwc 3 , 480.Xr ungetwc 3 , 481.Xr fgetws 3 , 482.Xr putwchar 3 , 483.Xr fputwc 3 , 484.Xr putwc 3 , 485and 486.Xr fputws 3 . 487They are also used for formatted I/O functions for wide characters 488such as 489.Xr fwscanf 3 , 490.Xr wscanf 3 , 491.Xr swscanf 3 , 492.Xr fwprintf 3 , 493.Xr wprintf 3 , 494.Xr swprintf 3 , 495.Xr vfwprintf 3 , 496.Xr vwprintf 3 , 497and 498.Xr vswprintf 3 , 499and wide character identifier of %lc, %C, %ls, %S for conventional 500formatted I/O functions. 501.El 502.Sh SEE ALSO 503.Xr gencat 1 , 504.Xr vidcontrol 1 , 505.Xr xfd 1 , 506.Xr xterm 1 , 507.Xr catgets 3 , 508.Xr gettext 3 Pq Pa pkgsrc/devel/gettext , 509.Xr nl_langinfo 3 , 510.Xr setlocale 3 511.Sh BUGS 512This man page is incomplete. 513