MySQL全文Stopwords原理 [英] MySQL Fulltext Stopwords Rationale
问题描述
我目前正在为我的网站开发一个基本的全文搜索,并且我注意到像有关这样的某些词被列为MySQL全文搜索的停用词。由于搜索给定新闻项目的人不一定会使用关于这个词搜索(但我当然不能为所有人说话),所以这并不会让我感到困扰。不过,我希望这里有人能够启发我关于制定停用词表的理由。谢谢!
澄清:
MyIsam用于全文表格。停用词是MySQL不会索引的词(适用于任何全文索引)。正如对这个问题的评论所指出的那样,没有任何形式的解释就有完整的停用词表。我只想知道是否有他们选择的单词背后的基本原理。 解决方案
英语中的常用词。在大多数情况下,如果您不索引这些单词,那么您的搜索结果将更具相关性 - 并且您的索引将更小更快。
您可以编辑如果更适合您的需求,则使用ft_stopword_file变量(或将其设置为''以将所有单词索引为长或长于ft_min_word_len的索引)的停用词列表。您也可以使用ft_min_word_len变量来更改最小索引字长度,该变量出于同样的原因。
I am currently trying to develop a basic fulltext search for my website, and I noticed that certain words like "regarding" are listed as stopwords for MySQL fulltext searches. This doesn't bother me too much right now since people searching for a given news item wouldn't necessarily search using the word "regarding" (but I certainly can't speak for everyone!). However, I was hoping someone here could enlighten me about the rationale for having a stopwords list. Thanks!
For Clarification: I'm using MyIsam for my fulltext table. The stopwords are words that MySQL won't index (for any fulltext index). As noted in a comment to this question, there is a full list of stopwords without any kind of explanation. I'd just like to know if there was a rationale behind the words "they" chose.
The stop words are just common words in the English language. In most cases, your search results will be more relevant -- and your indices will be smaller and faster -- if you don't index these words.
You can edit the stop word list using the ft_stopword_file variable (or set it to '' to index all words as long or longer than ft_min_word_len) if that suits your needs better. You can also change the minimum indexed word length using the ft_min_word_len variable, which exists for the same reason.
这篇关于MySQL全文Stopwords原理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!