l10n_intro(5)l10n_intro(5)NAME
l10n_intro, l10n, locales, LOCPATH - Introduction to localization
(L10N)
DESCRIPTION
Localization refers to the process of establishing information within a
computer system specific to each supported language, cultural data, and
coded character set (codeset) combination. Each such combination gives
rise to the definition of one locale. The abbreviation L10N is often
used to stand for localization, as there are 10 characters between the
beginning "L" and the ending "N" of that word.
See i18n_intro(5) for introductory information about internationaliza‐
tion and how to use system commands to set a locale. See localedef(1),
charmap(4), and locale(4) for information about creating locales. See
Writing Software for the International Market for information about
creating locales and writing applications that use locales.
The current release of the operating system supports the following lan‐
guages with locales. Each language is discussed separately in its own
reference page.
Catalan
Chinese (Simplified and Traditional)
Czech
Danish
Dutch
English (discussed in this reference page)
Finnish
Flemish
French
German
Greek
Hebrew
Hungarian
Icelandic
Italian
Japanese
Korean
Lithuanian
Norwegian
Polish
Portuguese
Russian
Slovak
Slovene
Spanish
Swedish
Thai
Turkish
For some of the languages, more than one codeset and country or terri‐
tory are supported. Hence, multiple locales are supported for certain
languages. The following list describes all the supported locales. For
information about the character encoding used by a particular locale,
see the reference page for the codeset specified in the last part of
the locale name or, for those that end in Unicode(5). Catalan locale
for Spain (uses the Latin-1 codeset) Catalan locale for Spain (uses the
Latin-9 codeset) Catalan locale for Spain (uses the UTF-8 codeset)
Czech locale for Czech Republic (uses the Latin-2 codeset) Czech locale
for Czech Republic (uses the UTF-8 codeset) Danish locale for Denmark
(uses the Latin-1 codeset) Danish locale for Denmark (uses the Latin-9
codeset) Danish locale for Denmark (uses the UTF-8 codeset) German
locale for Switzerland (uses the Latin-1 codeset) German locale for
Switzerland (uses the Latin-9 codeset) German locale for Switzerland
(uses the UTF-8 codeset) German locale for Germany (uses the Latin-1
codeset) German locale for Germany (uses the Latin-9 codeset) German
locale for Germany (uses the UTF-8 codeset) Greek locale for Greece
(uses the ISO Greek codeset) Greek locale for Greece (uses the UTF-8
codeset) English locale that includes the euro character (uses the
UTF-8 codeset)
This locale both supports the euro character and defines the
decimal point as a comma (,) and the thousands separator as a
period (.). Therefore, this locale is useful in many European
countries, not just those for which English is the native lan‐
guage, when assigned only to the LC_MONETARY locale category or
environment variable. English locale for Great Britain (uses
the Latin-1 codeset) English locale for Great Britain (uses the
Latin-9 codeset) English locale for Great Britain (uses the
UTF-8 codeset) English locale for the United States (uses the
Latin-1 codeset) English locale for the United States (uses the
Latin-9 codeset) English locale for the United States (uses
cp850 encoding)
Use this locale with data that contains accented characters and
that was generated on a PC using the cp850 code page for charac‐
ter encoding. This character encoding is usually the default for
the DOS and Windows operating systems in Europe. The
en_US.ISO8859-1 and en_US.cp850 locales encode English charac‐
ters the same way but use different values for accented and
other non-English characters in the Latin-1 character set. Eng‐
lish locale for the United States (uses the UTF-8 codeset) Eng‐
lish locale for the United States (uses the UTF-8 codeset)
The @euro variant defines the local currency sign to be the euro
character and the international currency sign to be EUR. See
also en_EU.UTF-8@euro. Spanish locale for Spain (uses the
Latin-1 codeset) Spanish locale for Spain (uses the Latin-9
codeset) Spanish locale for Spain (uses the UTF-8 codeset)
Finnish locale for Finland (uses the Latin-1 codeset) Finnish
locale for Finland (uses the Latin-9 codeset) Finnish locale for
Finland (uses the UTF-8 codeset) French locale for Belgium (uses
the Latin-1 codeset) French locale for Belgium (uses the Latin-9
codeset) French locale for Belgium (uses the UTF-8 codeset)
French locale for Canada (uses the Latin-1 codeset) French
locale for Canada (uses the Latin-9 codeset) French locale for
Canada (uses the UTF-8 codeset) French locale for Switzerland
(uses the Latin-1 codeset) French locale for Switzerland (uses
the Latin-9 codeset) French locale for Switzerland (uses the
UTF-8 codeset) French locale for France (uses the Latin-1 code‐
set) French locale for France (uses the Latin-9 codeset) French
locale for France (uses the UTF-8 codeset) Hebrew locale for
Israel (uses the ISO Hebrew codeset) Hungarian locale for Hun‐
gary (uses the Latin-2 codeset) Hungarian locale for Hungary
(uses the UTF-8 codeset) Icelandic locale for Iceland (uses the
Latin-1 codeset) Icelandic locale for Iceland (uses the Latin-9
codeset) Icelandic locale for Iceland (uses the UTF-8 codeset)
Italian locale for Italy (uses the Latin-1 codeset) Italian
locale for Italy (uses the Latin-9 codeset) Italian locale for
Italy (uses the UTF-8 codeset) Hebrew locale for Israel (uses
the ISO Hebrew codeset)
This locale name is supported for backward compatibility. The
recommended name to use for the ISO Hebrew locale is
he_IL.ISO8859-8. Japanese locale for Japan (uses the DEC Kanji
codeset) Japanese locale for Japan (uses the Japanese EUC code‐
set) Japanese locale for Japan (uses the Super DEC Kanji code‐
set) Japanese locale for Japan (uses the Shift JIS codeset) Ja‐
panese locale for Japan (uses the UTF-8 codeset) Korean locale
for Korea (uses the DEC Korean codeset) Korean locale for Korea
(uses the Korean EUC codeset) Korean locale for Korea (uses the
UTF-8 codeset) Lithuanian locale for Lithuania (uses the Latin-4
codeset) Lithuanian locale for Lithuania (uses the UTF-8 code‐
set) Flemish locale for Belgium (uses the Latin-1 codeset) Flem‐
ish locale for Belgium (uses the Latin-9 codeset) Flemish locale
for Belgium (uses the UTF-8 codeset) Dutch locale for the
Netherlands (uses the Latin-1 codeset) Dutch locale for the
Netherlands (uses the Latin-9 codeset) Dutch locale for the
Netherlands (uses the UTF-8 codeset) Norwegian locale for Nor‐
way (uses the Latin-1 codeset) Norwegian locale for Norway
(uses the Latin-9 codeset) Norwegian locale for Norway (uses
the UTF-8 codeset) Polish locale for Poland (uses the Latin-2
codeset) Polish locale for Poland (uses the UTF-8 codeset) Por‐
tuguese locale for Portugal (uses the Latin-1 codeset) Por‐
tuguese locale for Portugal (uses the Latin-9 codeset) Por‐
tuguese locale for Portugal (uses the UTF-8 codeset) Russian
locale for Russia (uses the ISO Cyrillic codeset) Russian locale
for Russia (uses the UTF-8 codeset) Slovak locale for Slovakia
(uses the Latin-2 codeset) Slovak locale for Slovakia (uses the
UTF-8 codeset) Slovene locale for Slovenia (uses the Latin-2
codeset) Slovene locale for Slovenia (uses the UTF-8 codeset)
Swedish locale for Sweden (uses the Latin-1 codeset) Swedish
locale for Sweden (uses the Latin-9 codeset) Swedish locale for
Sweden (uses the UTF-8 codeset) Thai locale for Thailand (uses
the TACTIS codeset) Turkish locale for Turkey (uses the Latin-5
codeset) Turkish locale for Turkey (uses the UTF-8 codeset) Sim‐
plified Chinese locale for the People's Republic of China (uses
the DEC Hanzi codeset) Simplified Chinese locale for the Peo‐
ple's Republic of China (uses the GBK codeset, an extension of
the GB 2312-80 codeset) Simplified Chinese locale for the Peo‐
ple's Republic of China (uses the GB18030 codeset, which extends
GBK by means of 4-byte encoding) Simplified Chinese locale for
the People's Republic of China (uses the UTF-8 codeset) Tradi‐
tional Chinese locale for Hong Kong (uses the BIG-5 codeset)
Traditional Chinese locale for Hong Kong (uses the DEC Hanyu
codeset) Simplified Chinese locale for Hong Kong (uses the DEC
Hanzi codeset) Traditional Chinese locale for Hong Kong (uses
the Taiwanese EUC codeset) Traditional Chinese locale for Hong
Kong (uses the UTF-8 codeset) Traditional Chinese locale for
Taiwan (uses the BIG-5 codeset) Traditional Chinese locale for
Taiwan (uses the DEC Hanyu codeset) Traditional Chinese locale
for Taiwan (uses the Taiwanese EUC codeset) Traditional Chinese
locale for Taiwan (uses the UTF-8 codeset)
This locale supports Simplified Chinese as well as Traditional
Chinese.
For the zh_CN.dechanzi locale, the @pinyin, @radical, and @stroke vari‐
ants are available for sorting by pinyin, radical, and stroke, respec‐
tively. For the zh_TW.big5, zh_TW.dechanyu, and zh_TW.eucTW locales,
the @chuyin, @radical, and @stroke variants are available for sorting
by chuyin, radical, and stroke, respectively. These variant locale
names (those including the @collation_modifier suffix) are available
for assignment to the LC_COLLATE variable.
The and locales are the only locales that include the euro monetary
symbol in the coded character set. The *.UTF-8@euro locales also define
the local currency symbol to be the euro character and the interna‐
tional currency symbol to be EUR. See euro(5) for more information
about the euro symbol and how it is supported.
You can use the -a option with the locale command to list all the
locales available on the system. The POSIX (or C) locale is always
available because it must exist on all systems that conform to The Open
Group's UNIX specifications. The POSIX locale is the default locale
when locale variables are not set.
Note
The dxterm terminal emulator does not support locales based on the Uni‐
code (UTF-8) or Latin-9 (ISO8859-15) codesets. Use dtterm, the default
terminal emulator for the Common Desktop Environment (CDE), with
locales based on the Latin-9 and UTF-8 codesets.
System Locales
When you install Worldwide Language Support, localization is supported
by two types of locales: Unicode locales and dense code locales.
Unicode locales conform to Unicode and ISO/IEC 10646 standards and use
UTF-32 as the wide character encoding. Under UTF-32 wide character
encoding, wchar_t values represent the same characters regardless of
the locale and, because Unicode standards prevail, implementation is
consistent across platforms.
Locales whose names end in use file code and internal process code
(wchar_t encoding) defined in the ISO 10646 and Unicode standards.
Other, non-UTF-8 Unicode locales use traditional UNIX and proprietary
codesets for the file code while using UTF-32 as the internal process
code. A subset of these Unicode locales have a @ucs4 modifier; how‐
ever, they are the same as the locales without the @ucs4 modifier. The
@ucs4 subset is provided for backward compatibility and may be removed
in the future. You cannot select @ucs4 locales from the CDE login menu;
you must specify the locale name in the LANG environment variable.
The universal.UTF-8 locale is also available (for use by applications
rather than end users). It supports the complete set of characters in
the universal character set (UCS).
See Unicode(5) for more information about encoding formats.
For locales, file code may include characters encoded in more than 1
byte; therefore, use these locales in applications that can process
multibyte data. Design new applications based on multibyte locales,
which incorporate a large character repertoire, to enable the applica‐
tion to expand future character support without changing the character
set.
Dense code locales use dense code for wide character encoding to mini‐
mize table size (that is, codepoints are assigned consecutively with no
empty positions). Under dense code locales, a wchar_t value for one
locale may not represent the same character in another locale and,
thus, is locale specific. Dense code locales are appropriate for appli‐
cations that have no dependencies on the internal process code or,
because dense code locales are slightly more efficient than Unicode
locales, require better performance.
All valid codepoints in multibyte character sets are mapped to valid
codepoints in Unicode, including unmapped codepoints that are mapped to
Unicode codepoints in the private use area. Thus, dense code locales
are equivalent to Unicode locales. In general, the same charmaps and
locale source can be used for Unicode and dense code locales. However,
Unicode and dense code characters that are not defined in the LC_COL‐
LATE section may be sorted differently.
A Unicode locale exists for each dense code locale. (However, not all
Unicode locales have a dense code version.) For Latin-1 locales
(ISO8859-1), the dense code and Unicode locales are identical because
Latin-1 characters are the same as the first 256 characters in Unicode.
The operating system also supports three UCS transformation formats
(UTFs), UTF-8, UTF-16, and UTF-32, all of which are defined in the Uni‐
code standard. See Unicode(5) for a full description of Unicode, UCS-4,
and the transformation formats.
The Unicode locales are installed in /usr/i18n/lib/nls/ucsloc/. Dense
code locales are installed in /usr/i18n/lib/nls/loc/. A symbolic link,
/usr/i18n/lib/nls/dloc points to the system default locales. For exam‐
ple, the Japanese locale filename, /usr/lib/nls/loc/ja_JP.eucJP, is a
symbolic link to /usr/i18n/lib/nls/dloc/ja_JP.eucJP, where /dloc is a
symbolic link to either /ucsloc for the Unicode version, or /loc for
the dense code version, of the Japanese locale. Keep in mind that the
same locale name can refer to a Unicode locale or to a dense code
locale, depending on the setting of the symbolic link. Thus, if run‐
ning an application in a locale is problematic, check the symbolic
link.
Because Unicode locales use consistent values for characters in wchar_t
form, a default link to Unicode locales can increase consistency across
locales and platforms. However, some users may prefer the older, dense
code locales that use proprietary algorithms to convert characters to
wchar_t form, or an application may have dependencies on dense code
wchar_t encoding. To switch between Unicode and dense code locales, the
system administrator, as root, uses i18nconfig to change the systemwide
default or manually changes the symbolic link /usr/i18n/lib/nls/dloc
from to
Environment Variables Related to Localization
The following system environment variables can be set (usually only by
installed applications or by programmers who are testing applications
or converters under development) to override the default search path
for certain kinds of localized files: Specifies the search path for
locales and codeset converters. This environment variable is not
defined by current industry standards. See iconv_intro(5),
iconv_open(3), and setlocale(3) for more information.
Because the LOCPATH variable is not defined by standards, it is
recommended for use only when testing locales or converters
under development and not as a systemwide method for finding
installed converters or locales. When you set LOCPATH, make
sure that the search path is valid for both locales and convert‐
ers. Otherwise, application and system software can find only
locales or only converters in environments where both kinds of
files are required. Specifies the search path for message cata‐
logs, which contain translated text for programs. This variable
is used primarily by the catopen() function. See catopen(3) for
detailed information on NLSPATH.
Customizing Locales
Partial source files, along with an associated Makefile, are available
for many locales in the /usr/lib/nls/loc/src directory. By editing one
of these source files and using the Makefile to rebuild the locale
(make locale_name), you can customize one or more of the following fea‐
tures: The format of affirmative and negative responses (LC_MESSAGES
section) Rules and symbols for formatting monetary numeric information
(LC_MONETARY section) Rules and symbols for formatting nonmonetary
numeric information (LC_NUMERIC section) Rules and symbols for format‐
ting date and time information (LC_TIME section)
As described in locale(4), the LC_CTYPE and LC_COLLATE sections of
these locale sources are not customizable using this method. This means
that you cannot use one of these sources to change how characters are
classified or collated. By implication, this also means that you cannot
add a new character to a locale that does not already support it. For
example, you cannot add the European monetary character (euro) to a
locale that does not already support that character. However, you can
edit the LC_MONETARY section to define a string identifier for euro by
using characters that the locale does support. For example, you could
replace the existing monetary symbol with EUR.
See locale(4) for more information on a locale source file. See Writing
Software for the International Market for information on user cus‐
tomization of LC_CTYPE and LC_COLLATE.
Caution
Customized versions of locales that are provided with the operating
system are not preserved when the operating system is reinstalled, even
when an update installation procedure is used. Therefore, you must back
up files for customized locales and their sources before reinstalling
the operating system. After the reinstallation is complete, you must
restore your customized locales to the system. If the newly installed
sources have revisions when compared to the old sources, it might be
preferable to apply your customizations to the newly installed sources
and rebuild your customized locales.
SEE ALSO
Commands: locale(1), localedef(1)
Functions: catopen(3)
Files: charmap(4), locale(4)
Others: Catalan(5), Chinese(5), Czech(5), dechanyu(5), dechanzi(5),
deckanji(5), deckorean(5), Dutch(5), eucJP(5), eucKR(5), eucTW(5),
euro(5), Finnish(5), French(5), GB18030(5), GBK(5) ,German(5),
Greek(5), Hebrew(5), Hungarian(5), i18n_intro(5), i18n_printing(5),
Icelandic(5), iconv_intro(5), iso2022(5), iso2022jp(5), iso8859-1(5),
iso8859-2(5), iso8859-4(5), iso8859-5(5), iso8859-7(5), iso8859-8(5),
iso8859-9(5), iso8859-15(5), Italian(5), Japanese(5), jiskanji(5),
Korean(5), Lithuanian(5), Norwegian(5), Polish(5), Portuguese(5), Rus‐
sian(5), sbig5(5), sdeckanji(5), shiftjis(5), Slovak(5), Slovene(5),
Spanish(5), Swedish(5), TACTIS(5), telecode(5)Thai(5), Turkish(5),
Unicode(5)
Writing Software for the International Market
Using International Software
l10n_intro(5)