Oracle 5.0 Reference Manual page 957

Table of Contents

Advertisement

|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
+----+-----------------------------------------+
6 rows in set (0.00 sec)
The following example is more complex. The query returns the relevance values and it also sorts the
rows in order of decreasing relevance. To achieve this result, you should specify
twice: once in the
SELECT
because the MySQL optimizer notices that the two
full-text search code only once.
mysql>
SELECT id, body, MATCH (title,body) AGAINST
->
('Security implications of running MySQL as root') AS score
->
FROM articles WHERE MATCH (title,body) AGAINST
->
('Security implications of running MySQL as root');
+----+-------------------------------------+-----------------+
| id | body
+----+-------------------------------------+-----------------+
|
4 | 1. Never run mysqld as root. 2. ... | 1.5219271183014 |
|
6 | When configured properly, MySQL ... | 1.3114095926285 |
+----+-------------------------------------+-----------------+
2 rows in set (0.00 sec)
The MySQL
FULLTEXT
and underscores) as a word. That sequence may also contain apostrophes ("'"), but not more than one
in a row. This means that
Apostrophes at the beginning or the end of a word are stripped by the
would be parsed as aaa'bbb.
The
parser determines where words start and end by looking for certain delimiter
FULLTEXT
characters; for example, " " (space), "," (comma), and "." (period). If words are not separated by
delimiters (as in, for example, Chinese), the
or ends. To be able to add words or other indexed terms in such languages to a
must preprocess them so that they are separated by some arbitrary delimiter such as """.
Some words are ignored in full-text searches:
• Any word that is too short is ignored. The default minimum length of words that are found by full-text
searches is four characters.
• Words in the stopword list are ignored. A stopword is a word such as "the" or "some" that is so
common that it is considered to have zero semantic value. There is a built-in stopword list, but it can
be overwritten by a user-defined list.
The default stopword list is given in
length and stopword list can be changed as described in
Search".
Every correct word in the collection and in the query is weighted according to its significance in the
collection or query. Consequently, a word that is present in many documents has a lower weight
(and may even have a zero weight), because it has lower semantic value in this particular collection.
Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to
compute the relevance of the row.
Such a technique works best with large collections (in fact, it was carefully tuned this way). For very
small tables, word distribution does not adequately reflect their semantic value, and this model may
sometimes produce bizarre results. For example, although the word "MySQL" is present in every row of
the
table shown earlier, a search for the word produces no results:
articles
mysql>
SELECT * FROM articles
->
WHERE MATCH (title,body) AGAINST ('MySQL');
Empty set (0.00 sec)
Natural Language Full-Text Searches
0.66266459226608 |
list and once in the
WHERE
| score
implementation regards any sequence of true word characters (letters, digits,
is regarded as one word, but
aaa'bbb
FULLTEXT
Section 12.9.4, "Full-Text
937
0 |
0 |
0 |
0 |
clause. This causes no additional overhead,
[934]
calls are identical and invokes the
MATCH()
|
aaa''bbb
FULLTEXT
parser cannot determine where a word begins
Stopwords". The default minimum word
Section 12.9.6, "Fine-Tuning MySQL Full-Text
[934]
MATCH()
is regarded as two words.
parser;
'aaa'bbb'
index, you
FULLTEXT

Advertisement

Table of Contents
loading

This manual is also suitable for:

Mysql 5.0

Table of Contents