Oracle 5.0 Reference Manual page 2922

Table of Contents

Advertisement

MySQL 5.0 FAQ: MySQL Chinese, Japanese, and Korean Character Sets
Since the character set appears to be correct, let's see what information the
INFORMATION_SCHEMA.COLUMNS
mysql>
SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
->
FROM INFORMATION_SCHEMA.COLUMNS
->
WHERE COLUMN_NAME = 's1'
->
AND TABLE_NAME = 't';
+-------------+--------------------+-----------------+
| COLUMN_NAME | CHARACTER_SET_NAME | COLLATION_NAME
+-------------+--------------------+-----------------+
| s1
| ucs2
+-------------+--------------------+-----------------+
1 row in set (0.01 sec)
(See
Section 19.4, "The
You can see that the collation is
this is so can be found using
mysql>
SHOW CHARSET LIKE 'ucs2%';
+---------+---------------+-------------------+--------+
| Charset | Description
+---------+---------------+-------------------+--------+
| ucs2
| UCS-2 Unicode | ucs2_general_ci
+---------+---------------+-------------------+--------+
1 row in set (0.00 sec)
For
and utf8, the default collation is "general". To specify a Unicode collation, use
ucs2
ucs2_unicode_ci.
B.11.16: Why are my supplementary characters rejected by MySQL?
Before MySQL 5.5.3, MySQL does not support supplementary characters—that is, characters which
need more than 3 bytes—for UTF-8. We support only what Unicode calls the Basic Multilingual Plane /
Plane 0. Only a few very rare Han characters are supplementary; support for them is uncommon. This
has led to reports such as that found in Bug #12600, which we rejected as "not a bug". With utf8,
we must truncate an input string when we encounter bytes that we don't understand. Otherwise, we
wouldn't know how long the bad multi-byte character is.
One possible workaround is to use
changed to question marks; however, no truncation takes place. You can also change the data type to
or BINARY, which perform no validity checking.
BLOB
As of MySQL 5.5.3, Unicode support is extended to include supplementary characters by means of
additional Unicode character sets: utf16, utf32, and 4-byte utf8mb4. These character sets support
supplementary Unicode characters outside the Basic Multilingual Plane (BMP).
B.11.17: Shouldn't it be "CJKV"?
No. The term "CJKV" (Chinese Japanese Korean Vietnamese) refers to Vietnamese character sets
which contain Han (originally Chinese) characters. MySQL has no plan to support the old Vietnamese
script using Han characters. MySQL does of course support the modern Vietnamese script with
Western characters.
As of MySQL 5.6, there are Vietnamese collations for Unicode character sets, as described in
Section 10.1.13.1, "Unicode Character
B.11.18: Does MySQL allow CJK characters to be used in database and table names?
This issue is fixed in MySQL 5.1, by automatically rewriting the names of the corresponding directories
and files.
For example, if you create a database named
CJK in directory names, MySQL creates a directory named @0w@00a5@00ae. which is just a fancy way
table can provide about this column:
| ucs2_general_ci |
INFORMATION_SCHEMA COLUMNS
ucs2_general_ci
CHARSET, as shown here:
SHOW
| Default collation | Maxlen |
instead of utf8, in which case the "bad" characters are
ucs2
Sets".
2902
|
Table", for more information.)
instead of ucs2_unicode_ci. The reason why
|
2 |
on a server whose operating system does not support
COLLATE

Advertisement

Table of Contents
loading

This manual is also suitable for:

Mysql 5.0

Table of Contents