Oracle 5.0 Reference Manual page 2919

Table of Contents

Advertisement

MySQL 5.0 FAQ: MySQL Chinese, Japanese, and Korean Character Sets
| character_set_results
| character_set_server
| character_set_system
| character_sets_dir
+--------------------------+----------------------------------------+
8 rows in set (0.01 sec)
Now stop the client, and then stop the server using mysqladmin. Then start the server again, but this
time tell it to skip the handshake like so:
mysqld --character-set-server=utf8 --skip-character-set-client-handshake
Start the client with
utf8
mysql>
SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------------------+
| Variable_name
+--------------------------+----------------------------------------+
| character_set_client
| character_set_connection | latin1
| character_set_database
| character_set_filesystem | binary
| character_set_results
| character_set_server
| character_set_system
| character_sets_dir
+--------------------------+----------------------------------------+
8 rows in set (0.01 sec)
As you can see by comparing the differing results from
client's initial settings if the
B.11.12: Why do some
There is a very simple problem with
know the end of a character. With multi-byte character sets, different characters might have different
octet lengths. For example, in utf8,
+-------------------------+---------------------------+
| OCTET_LENGTH(_utf8 'A') | OCTET_LENGTH(_utf8 'ペ') |
+-------------------------+---------------------------+
|
+-------------------------+---------------------------+
1 row in set (0.00 sec)
If we don't know where the first character ends, then we don't know where the second character
begins, in which case even very simple searches such as
a regular CJK character set in the first place, or to convert to a CJK character set before comparing.
This is one reason why MySQL cannot allow encodings of nonexistent characters. If it is not strict about
rejecting bad input, then it has no way of knowing where characters end.
For
searches, we need to know where words begin and end. With Western languages,
FULLTEXT
this is rarely a problem because most (if not all) of these use an easy-to-identify word boundary—
the space character. However, this is not usually the case with Asian writing. We could use arbitrary
halfway measures, like assuming that all Han characters represent words, or (for Japanese) depending
on changes from Katakana to Hiragana due to grammatical endings. However, the only sure solution
requires a comprehensive word list, which means that we would have to include a dictionary in the
server for each Asian language supported. This is simply not feasible.
B.11.13: How do I know whether character
The majority of simplified Chinese and basic nonhalfwidth Japanese Kana characters appear in all
CJK character sets. This stored procedure accepts a
character sets, and displays the results in hexadecimal.
DELIMITER //
| utf8
| latin1
| utf8
| /usr/local/mysql/share/mysql/charsets/ |
once again as the default character set, then display the current settings:
| Value
| latin1
| latin1
| latin1
| latin1
| utf8
| /usr/local/mysql/share/mysql/charsets/ |
--skip-character-set-client-handshake
[896]
and
LIKE
FULLTEXT
[896]
searches on
LIKE
requires one byte but
A
1 |
X
2899
VARIABLES, the server ignores the
SHOW
searches with CJK characters fail?
and
BINARY
requires three bytes, as shown here:
3 |
LIKE '_A%'
is available in all character sets?
Unicode character, converts it to all other
UCS-2
|
|
|
|
|
|
|
|
|
|
|
[402]
is used.
columns: we need to
BLOB
[896]
fail. The solution is to use

Advertisement

Table of Contents
loading

This manual is also suitable for:

Mysql 5.0

Table of Contents