你如何让你的全文布尔搜索来选择术语 C++? [英] How do you get your Fulltext boolean search to pick up the term C++?

查看:31
本文介绍了你如何让你的全文布尔搜索来选择术语 C++?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我需要了解如何在 MySQL 数据库上进行全文布尔搜索,以返回包含术语C++"的记录.

So, I need to find out how to do a fulltext boolean search on a MySQL database to return a record containg the term "C++".

我的 SQL 搜索字符串为:

I have my SQL search string as:

SELECT * 
FROM mytable 
WHERE MATCH (field1, field2, field3) 
AGAINST ("C++" IN BOOLEAN MODE) 

虽然我的所有字段都包含字符串 C++,但它从未在搜索结果中返回.

Although all of my fields contain the string C++, it is never returned in the search results.

如何修改 MySQL 以适应这种情况?是否可以?

How can I modify MySQL to accommodate this? Is it possible?

我找到的唯一解决方案是在输入数据的过程中转义 + 字符,例如__plus",然后修改我的搜索以适应,但这似乎很麻烦,必须有更好的方法.

The only solution I have found would be to escape the + character during the process of entering my data as something like "__plus" and then modifying my search to accomodate, but this seems cumbersome and there has to be a better way.

推荐答案

如何修改 MySQL 以适应这种情况?

How can I modify MySQL to accommodate this?

你必须改变 MySQL 对单词是什么的看法.

You'll have to change MySQL's idea of what a word is.

首先,默认的最小字长为 4.这意味着不会匹配仅包含 <4 个字母的单词的搜索词,无论是C++"还是cpp".您可以使用 ft_min_word_len 进行配置配置选项,例如.在你的 my.cfg 中:

Firstly, the default minimum word length is 4. This means that no search term containing only words of <4 letters will ever match, whether that's ‘C++’ or ‘cpp’. You can configure this using the ft_min_word_len config option, eg. in your my.cfg:

[mysqld]
ft_min_word_len=3

(然后停止/启动 MySQLd 并重建全文索引.)

(Then stop/start MySQLd and rebuild fulltext indices.)

其次,+"不被 MySQL 视为字母.您可以将其设为字母,但这意味着您将无法在字符串fish+chips"中搜索fish"一词,因此需要注意.而且这不是微不足道的:它需要重新编译 MySQL 或破解现有的字符集.请参阅 文档第 11.8.6 节.

Secondly, ‘+’ is not considered a letter by MySQL. You can make it a letter, but then that means you won't be able to search for the word ‘fish’ in the string ‘fish+chips’, so some care is required. And it's not trivial: it requires recompiling MySQL or hacking an existing character set. See the section beginning "If you want to change the set of characters that are considered word characters..." in section 11.8.6 of the doc.

在将我的数据输入为__plus"之类的过程中转义 + 字符,然后修改我的搜索以适应

escape the + character during the process of entering my data as something like "__plus" and then modifying my search to accomodate

是的,这样的事情是一个常见的解决方案:您可以将真实"数据(没有转义)保存在一个主要的、确定的表中 — 通常使用 InnoDB 来满足 ACID 合规性.然后可以添加一个辅助 MyISAM 表,其中仅包含用于全文搜索诱饵的损坏词.您还可以使用这种方法进行有限形式的词干提取.

Yes, something like that is a common solution: you can keep your ‘real’ data (without the escaping) in a primary, definitive table — usually using InnoDB for ACID compliance. Then an auxiliary MyISAM table can be added, containing only the mangled words for fulltext search bait. You can also do a limited form of stemming using this approach.

另一种可能性是检测 MySQL 无法执行的搜索,例如只有短词或不寻常字符的搜索,然后回退到仅针对这些搜索进行简单但缓慢的 LIKE 或 REGEXP 搜索.在这种情况下,您可能还想通过设置 ft_stopword_file 到一个空字符串,因为将其中的所有内容也都作为特殊内容是不切实际的.

Another possibility is to detect searches that MySQL can't do, such as those with only short words, or unusual characters, and fall back to a simple-but-slow LIKE or REGEXP search for those searches only. In this case you will probably also want to remove the stoplist by setting ft_stopword_file to an empty string, since it's not practical to pick up everything in that as special too.

这篇关于你如何让你的全文布尔搜索来选择术语 C++?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆