你如何让你的全文布尔搜索拿起术语C ++? [英] How do you get your Fulltext boolean search to pick up the term C++?

查看:152
本文介绍了你如何让你的全文布尔搜索拿起术语C ++?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我需要找出如何做一个MySQL数据库上全文布尔搜索返回的记录containg术语C ++。

我有我的SQL搜索字符串为:

  SELECT *
FROM MYTABLE
WHERE MATCH(字段1,场2,FIELD3)
AGAINST(C ++IN BOOLEAN MODE)

虽然我所有的字段包含字符串C ++,这是从来没有在搜索结果中返回。

我如何修改MySQL来适应呢?是否可以?

我发现会进入我的数据,像__plus,然后修改我的搜索,以适应过程中为了躲避+字符,但这似乎麻烦,必须有一个更好的办法,唯一的解决办法。


解决方案

  

我如何修改MySQL来适应这种?


您必须改变的话是什么MySQL的主意。

首先,默认的最小单词长度为4。这意味着不包含搜索词只是&LT的话; 4个字母都不会搭配,不管是C ++或CPP。您可以使用此配置的<一个href=\"http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar%5Fft%5Fmin%5Fword%5Flen\">ft_min_word_len配置选项,如:在my.cfg:

 的[mysqld]
的ft_min_word_len = 3

(然后停止/启动mysqld,重建全文索引。)

其次,'+'不被认为是由MySQL的信。你可以把它的信,但后来这意味着你将无法搜索所以一些需要小心的词'鱼'的字符串'鱼+芯片。这不是简单的:它需要重新编译MySQL或黑客攻击现有的字符集。请参见开始如果你想改变的一组被认为是单词字符...字符中的部分11.8.6


  进入我的数据作为类似__plus,然后修改我的搜索,以适应的过程中,

逃脱字符+


是的,这样的事情是一个共同的解决办法:你可以保持你的真实数据(不逃逸)在主,明确表 - 通常使用的是InnoDB的ACID兼容。然后,一个辅助的MyISAM表可以添加,只包含了全文检索诱饵错位的话。您也可以使用这种方法所产生的有限的形式。

另一种可能性是检测搜索,MySQL的不能做,如那些只用简短的几个字,或不寻常的人物,并回落到了只有那些搜索一个简单但很慢的像或REGEXP搜索。在这种情况下,你可能也需要通过设置<一个去掉索引字表href=\"http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar%5Fft%5Fstopword%5Ffile\">ft_stopword_file为空字符串,因为它不是实际的拿起一切都在那个特殊了。

So, I need to find out how to do a fulltext boolean search on a MySQL database to return a record containg the term "C++".

I have my SQL search string as:

SELECT * 
FROM mytable 
WHERE MATCH (field1, field2, field3) 
AGAINST ("C++" IN BOOLEAN MODE) 

Although all of my fields contain the string C++, it is never returned in the search results.

How can I modify MySQL to accommodate this? Is it possible?

The only solution I have found would be to escape the + character during the process of entering my data as something like "__plus" and then modifying my search to accomodate, but this seems cumbersome and there has to be a better way.

解决方案

How can I modify MySQL to accommodate this?

You'll have to change MySQL's idea of what a word is.

Firstly, the default minimum word length is 4. This means that no search term containing only words of <4 letters will ever match, whether that's ‘C++’ or ‘cpp’. You can configure this using the ft_min_word_len config option, eg. in your my.cfg:

[mysqld]
ft_min_word_len=3

(Then stop/start MySQLd and rebuild fulltext indices.)

Secondly, ‘+’ is not considered a letter by MySQL. You can make it a letter, but then that means you won't be able to search for the word ‘fish’ in the string ‘fish+chips’, so some care is required. And it's not trivial: it requires recompiling MySQL or hacking an existing character set. See the section beginning "If you want to change the set of characters that are considered word characters..." in section 11.8.6 of the doc.

escape the + character during the process of entering my data as something like "__plus" and then modifying my search to accomodate

Yes, something like that is a common solution: you can keep your ‘real’ data (without the escaping) in a primary, definitive table — usually using InnoDB for ACID compliance. Then an auxiliary MyISAM table can be added, containing only the mangled words for fulltext search bait. You can also do a limited form of stemming using this approach.

Another possibility is to detect searches that MySQL can't do, such as those with only short words, or unusual characters, and fall back to a simple-but-slow LIKE or REGEXP search for those searches only. In this case you will probably also want to remove the stoplist by setting ft_stopword_file to an empty string, since it's not practical to pick up everything in that as special too.

这篇关于你如何让你的全文布尔搜索拿起术语C ++?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆