For Metadata - Oracle 5.0 Reference Manual

Table of Contents

Advertisement

The MySQL implementation of UCS-2 stores characters in big-endian byte order and does not use a
byte order mark (BOM) at the beginning of values. Other database systems might use little-endian byte
order or a BOM. In such cases, conversion of values will need to be performed when transferring data
between those systems and MySQL.
MySQL uses no BOM for UTF-8 values.
Client applications that need to communicate with the server using Unicode should set the client
character set accordingly; for example, by issuing a
used as a client character set, which means that it does not work for
SET. (See
Section 10.1.4, "Connection Character Sets and
The following sections provide additional detail on the Unicode character sets in MySQL.
10.1.10.1. The
ucs2
In UCS-2, every character is represented by a 2-byte Unicode code with the most significant byte
first. For example:
sequence:
0x00
sequence:
0x04
Page.
In MySQL, the
10.1.10.2. The
utf8
UTF-8 (Unicode Transformation Format with 8-bit units) is an alternative way to store Unicode data.
It is implemented according to RFC 3629, which describes encoding sequences that take from one to
four bytes. Currently, MySQL support for UTF-8 does not include 4-byte sequences. (An older standard
for UTF-8 encoding, RFC 2279, describes UTF-8 sequences that take from one to six bytes. RFC 3629
renders RFC 2279 obsolete; for this reason, sequences with five and six bytes are no longer used.)
The idea of UTF-8 is that various Unicode characters are encoded using byte sequences of different
lengths:
• Basic Latin letters, digits, and punctuation signs use one byte.
• Most European and Middle East script letters fit into a 2-byte sequence: extended Latin letters (with
tilde, macron, acute, grave and other accents), Cyrillic, Greek, Armenian, Hebrew, Arabic, Syriac,
and others.
• Korean, Chinese, and Japanese ideographs use 3-byte sequences.
Tip: To save space with UTF-8, use
bytes for each character in a
possible length. For example, MySQL must reserve 30 bytes for a
column.
10.1.11. UTF-8 for Metadata
Metadata is "the data about the data." Anything that describes the database—as opposed to being
the contents of the database—is metadata. Thus column names, database names, user names,
version names, and most of the string results from
of tables in
database objects.
Representation of metadata must satisfy these requirements:
• All metadata must be in the same character set. Otherwise, neither the
statements for tables in
same column of the results of these operations would be in different character sets.
Character Set (UCS-2 Unicode Encoding)
LATIN CAPITAL LETTER A
0x41.
CYRILLIC SMALL LETTER YERU
0x4B. For Unicode characters and their codes, please refer to the
character set is a fixed-length 16-bit encoding for Unicode BMP characters.
ucs2
Character Set (3-Byte UTF-8 Unicode Encoding)
CHAR CHARACTER SET utf8
INFORMATION_SCHEMA
INFORMATION_SCHEMA
UTF-8 for Metadata
SET NAMES 'utf8'
has the code
instead of CHAR. Otherwise, MySQL must reserve three
VARCHAR
are metadata. This is also true of the contents
SHOW
because those tables by definition contain information about
would work properly because different rows in the
792
statement.
or
SET NAMES
Collations".)
and it is stored as a 2-byte
0x0041
(Unicode 0x044B) is stored as a 2-byte
column because that is the maximum
CHAR(10) CHARACTER SET utf8
statements nor
SHOW
cannot be
ucs2
SET CHARACTER
Unicode Home
SELECT

Advertisement

Table of Contents
loading

This manual is also suitable for:

Mysql 5.0

Table of Contents