+----------------+---------+----+---------+----------+---------+
10.4.4. Adding a UCA Collation to a Unicode Character Set
This section describes how to add a UCA collation for a Unicode character set by writing the
<collation>
The procedure described here does not require recompiling MySQL. It uses a subset of the Locale
Data Markup Language (LDML) specification, which is available at http://www.unicode.org/reports/tr35/.
In 5.0, this method of adding collations is supported as of MySQL 5.0.46. With this method, you need
not define the entire collation. Instead, you begin with an existing "base" collation and describe the new
collation in terms of how it differs from the base collation. The following table lists the base collations of
the Unicode character sets for which UCA collations can be defined.
Table 10.1. MySQL Character Sets Available for User-Defined UCA Collations
Character Set
utf8
ucs2
The following sections show how to add a collation that is defined using LDML syntax, and provide a
summary of LDML rules supported in MySQL.
10.4.4.1. Defining a UCA Collation using LDML Syntax
To add a UCA collation for a Unicode character set without recompiling MySQL, use the
following procedure. If you are unfamiliar with the LDML rules used to describe the collation's sort
characteristics, see
The example adds a collation named
designed for a scenario involving a Web application for which users post their names and phone
numbers. Phone numbers can be given in very different formats:
+7-12345-67
+7-12-345-67
+7 12 345 67
+7 (12) 345 67
+71234567
The problem raised by dealing with these kinds of values is that the varying permissible formats make
searching for a specific phone number very difficult. The solution is to define a new collation that
reorders punctuation characters, making them ignorable.
1. Choose a collation ID, as shown in
use an ID of 252.
2. To modify the
the
character_sets_dir
although the path name might be different on your system:
mysql>
SHOW VARIABLES LIKE 'character_sets_dir';
+--------------------+-----------------------------------------+
| Variable_name
+--------------------+-----------------------------------------+
| character_sets_dir | /user/local/mysql/share/mysql/charsets/ |
+--------------------+-----------------------------------------+
3. Choose a name for the collation and list it in the
the collation ordering rules. Find the
collation is being added, and add a
ID, to associate the name with the ID. Within the
element containing the ordering rules:
<charset name="utf8">
Adding a UCA Collation to a Unicode Character Set
element within a
<charset>
Base Collation
utf8_unicode_ci
ucs2_unicode_ci
Section 10.4.4.2, "LDML Syntax Supported in
utf8_phone_ci
configuration file. This file will be located in the directory named by
Index.xml
[443]
system variable. You can check the variable value as follows,
| Value
character set description in the MySQL
to the
Section 10.4.2, "Choosing a Collation
Index.xml
element for the character set to which the
<charset>
element that indicates the collation name and
<collation>
<collation>
813
Index.xml
MySQL".
character set. The collation is
utf8
ID". The following steps
|
file. In addition, you'll need to provide
element, provide a
file.
<rules>
Need help?
Do you have a question about the 5.0 and is the answer not in the manual?
Questions and answers