GB18030(5)GB18030(5)NAME
GB18030, gb18030 - A Chinese character set that extends GBK by means of
4-byte code points
DESCRIPTION
The GB18030-2000 character set, defined by the Chinese national stan‐
dard organization, is an extension of the GBK character set, which
itself is an extension to the GB2312-80 character set. (See the GBK(5)
reference page.)
GB18030 incorporates GBK support for all the Hanzi characters specified
by the Unicode Version 3.0 and ISO/IEC 10646-2001 standards.
GB18030 Code Space and Code Points
The GB18030 character set has 1-byte, 2-byte, and 4-byte encoding with
the following structure:
─────────────────────────────────────────────────────────────────
Number of Bytes Code Space Total Code Points
─────────────────────────────────────────────────────────────────
1-byte 0x00 to 0x7F 128
2-byte 0x81 to 0xFE 23940
0x40 to 0xFE (except 0x7F)
4-byte 0x81 to 0xFE 1587600
0x30 to 0x39
0x81 to 0xFE
0x30 to 0x39
─────────────────────────────────────────────────────────────────
The GB18030 1-byte code provides support for ASCII. The 2-byte code
provides support for all the CJK characters (Chinese, Japanese, and
Korean) defined in the Unicode 2.1 standard. The 4-byte code provides
support for the Unicode Version 3.0 additions to Version 2.1. The
4-byte code also leaves a large number of unassigned codepoints that
are available for future use.
The GB18030 character set maps the invalid Unicode codepoints U+FFFE
and U+FFFF to 4-byte codes. Because these two characters are invalid in
UCS, this mapping can cause problems with round-trip character conver‐
sions.
The GB18030 character set does no mapping from 4-byte code to the UCS
surrogate area (U+D800 through U+DFFF).
Codeset Converters for GB18030
The following codeset converter pairs are available for converting Sim‐
plified Chinese characters between GB18030 and UCS formats. Refer to
Unicode(5) for more information about the UTF-16, UCS-4, and UTF-8
encoding formats. Refer to iconv_intro(5) for an introduction to code‐
set conversion. UTF-16_GB18030, GB18030_UTF-16
Converting from and to UTF-16 format UCS-4_GB18030,
GB18030_UCS-4
Converting from and to UCS-4 format UTF-8_GB18030, GB18030_UTF-8
Converting from and to UTF-8 format
Fonts for GB18030
The operating system provides the following Simplified Chinese TrueType
fonts for GB18030: -css_dongwen-fangsong-medium-r-nor‐
mal--0-0-0-0-c-0-iso8859-1 -css_dongwen-fangsong-medium-r-nor‐
mal--0-0-0-0-c-0-iso10646-1
-css_dongwen-heiti-medium-r-normal--0-0-0-0-c-0-iso8859-1 -css_dongwen-
heiti-medium-r-normal--0-0-0-0-c-0-iso10646-1
-css_dongwen-kaiti-medium-r-normal--0-0-0-0-c-0-iso8859-1 -css_dongwen-
kaiti-medium-r-normal--0-0-0-0-c-0-iso10646-1
-css_dongwen-songti-medium-r-normal--0-0-0-0-c-0-iso8859-1 -css_dong‐
wen-songti-medium-r-normal--0-0-0-0-c-0-iso10646-1
These fonts can be used for printing with Chinese text printers. The
operating system uses Unicode fonts and the SongTi font style as the
default screen font for the GB18030 codeset. See wwpsof(8) for informa‐
tion on the PostScript print filter and TrueType fonts.
SEE ALSO
Commands: locale(1)
Others: ascii(5), big5(5), Chinese(5), dechanyu(5), dechanzi(5),
eucTW(5), GBK(5), i18n_intro(5), i18n_printing(5), l10n_intro(5),
sbig5(5), telecode(5)GB18030(5)