在MySQL中阻止单词 [英] Stemming Words in MySQL

查看:129
本文介绍了在MySQL中阻止单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在MySQL中使用词干 例如用户可能会搜索测试",已测试"或测试".所有这些单词都是相互关联的,因为基本单词"test"在所有单词中都是通用的. 有没有办法得到这样的结果或功能?

Stemming Words in MySQL For e.g. the user might search for "testing", "tested" or "tests". All these words are related to each other because the base word "test" is common in all of them. Is there a way to get such result or function?

推荐答案

MySQL全文搜索

从历史上看,MyISAM引擎支持全文搜索.在5.6版之后,MySQL还支持InnoDB存储引擎中的全文本搜索.这是个好消息,因为它使开发人员可以从InnoDB的参照完整性,执行事务的能力以及行级锁定中受益.

Historically, full-text searches were supported in MyISAM engines. After version 5.6, MySQL also supported full-text searches in InnoDB storage engines. This has been great news, since it enables developers to benefit from InnoDB’s referential integrity, ability to perform transactions, and row level locks.

在MySQL中,全文搜索基本上有两种方法:自然语言和布尔模式. (第三个选项通过第二个扩展查询扩展了自然语言搜索.)

There are basically two approaches to full-text searches in MySQL: natural language and boolean mode. (A third option augments the natural language search with a second expansion query.)

自然模式和布尔模式之间的主要区别在于,布尔值允许某些运算符作为搜索的一部分.例如,如果查询中某个单词比其他单词具有更大的相关性,或者如果结果中应包含一个特定的单词,则可以使用布尔运算符.值得注意的是,在两种情况下,结果都可以通过计算的相关性进行排序在搜索过程中使用MySQL.

The main difference between the natural and boolean modes is that the boolean allows certain operators as part of the search. For instance, boolean operators can be used if a word has greater relevance than others in the query or if a specific word should be present in the results, etc. It’s worth noticing that in both cases, results can be sorted by the relevance computed by MySQL during the search.

最适合我们的问题的是在布尔模式下使用InnoDb全文搜索.为什么?

  • 我们几乎没有时间实现搜索功能.
  • 目前,我们没有大数据需要处理,也没有庞大的负载来要求类似Elasticsearch或Sphinx的东西.
  • 我们使用了不支持Elasticsearch或Sphinx的共享托管,并且硬件在此阶段还很有限.
  • 虽然我们希望在搜索功能中添加词,但这并不是一个破坏交易的事情:我们可以通过一些简单的PHP编码和数据非规范化来实现它(在约束内)
  • 以布尔模式进行的全文搜索可以搜索带有通配符的单词(用于词干),并根据相关性对结果进行排序.

在规范化的Vertabelo模型中

让我们看看一个简单的搜索是如何工作的.我们将首先创建一个示例表:

Let’s see how a simple search would work. We’ll create a sample table first:

CREATE TABLE artists (
         id int(11) NOT NULL AUTO_INCREMENT, name varchar(255) NOT NULL,bio text NOT NULL, CONSTRAINT artists_pk PRIMARY KEY (id)
                    )ENGINE InnoDB;
CREATE  FULLTEXT INDEX artists_idx_1 ON artists (name);

以自然语言模式

您可以插入一些示例数据并开始测试. (最好将其添加到示例数据集中.)例如,我们将尝试搜索Michael Jackson:

You can insert some sample data and start testing. (It would be good to add it to your sample dataset.) For instance, we’ll try searching for Michael Jackson:

SELECT
    *
FROM
    artists
WHERE
    MATCH (artists.name) AGAINST ('Michael Jackson' IN NATURAL LANGUAGE MODE)

此查询将查找与搜索词匹配的记录,并将按相关性对匹配的记录进行排序;匹配越好,相关性就越高,结果将显示在列表中越高. 在布尔模式下

This query will find records that match the search terms and will sort matching records by relevance; the better the match, the more relevant it is and the higher the result will appear in the list. In boolean mode

我们可以在布尔模式下执行相同的搜索.如果我们不对查询应用任何运算符,则唯一的区别是结果不会按相关性排序:

We can perform the same search in boolean mode. If we don’t apply any operators to our query, the only difference will be that results are not sorted by relevance:

SELECT
    *
FROM
    artists
WHERE
    MATCH (artists.name) AGAINST ('Michael Jackson' IN BOOLEAN MODE)

布尔模式下的通配符

由于我们要搜索词干和部分词,因此需要通配符(*).该运算符可用于布尔模式搜索,这就是我们选择该模式的原因.

Since we want to search stemmed and partial words, we will need the wildcard operator (*). This operator can be used in boolean mode searches, which is why we chose that mode.

因此,让我们释放布尔搜索的力量,然后尝试搜索艺术家姓名的一部分.我们将使用通配符运算符来匹配名称以"Mich"开头的艺术家:

So, let’s unleash the power of boolean search and try searching for part of the artist’s name. We’ll use the wildcard operator to match any artist whose name starts with ‘Mich’:

SELECT
    *
FROM
    artists
WHERE
    MATCH (name) AGAINST ('Mich*' IN BOOLEAN MODE)

这篇关于在MySQL中阻止单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆