MySQL全文搜索hashtags(包括索引中的#符号) [英] MySQL Full-Text search for hashtags (including the # symbol in index)

查看:380
本文介绍了MySQL全文搜索hashtags(包括索引中的#符号)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我非常肯定,应该有一种方法可以在MyISAM表中使用全文索引搜索主题标签。默认设置将执行以下操作:

  textfield 
hashtag
#hashtag
#two# hashtag #hashtag

SELECT * FROM table WHERE MATCH(textfield)AGAINST('#hashtag')
> | hashtag |
> | #hashtag |
> | #two #hashtag #hashtag |

虽然它应该只返回第2和第3行。它看起来像标签被视为一个单词分隔符,所以它在搜索开始之前被删除。我应该怎么做才能使索引和搜索包含作为词的一部分?

解决方案

微调MySQL全文搜索


您可以通过多种方式更改被视为单词字符的字符集,如以下列表中所述。进行修改后,重建每个包含任何 FULLTEXT 索引的表的索引。假设你想把连字符(' - ')作为单词字符。使用以下方法之一:


  • 修改MySQL源代码:在 storage / myisam / ftdefs.h ,请参阅 true_word_char() misc_word_char()宏。将' - '添加到其中一个宏中,然后重新编译MySQL。修改字符集文件:

  • 这不需要重新编译。 true_word_char()宏使用字符类型表来区分字母和数字与其他字符。 。您可以在一个字符集XML文件中编辑< ctype>< map> 数组的内容,以指定' - '是一个字母,然后使用给定的字符集作为你的 FULLTEXT 索引。有关< ctype>< map> 数组格式的信息,请参阅第10.3.1节Character Definition Arrays
  • 为角色添加新的排序规则设置索引列使用的设置,并更改列以使用该归类。有关添加归类的一般信息,请参阅第10.4节将归类添加到字符集。有关全文索引的具体示例,请参阅第12.9.7节,为全文索引添加整理


I am pretty sure there should be a way to search for hashtags using Full-Text index in MyISAM table. Default setup would do the following:

textfield 
hashtag
#hashtag
#two #hashtag #hashtag

SELECT * FROM table WHERE MATCH(textfield) AGAINST ('#hashtag')
> | hashtag                |
> | #hashtag               |
> | #two #hashtag #hashtag |

While it should return only 2nd and 3rd rows instead. It looks like hashtag is treated as a word delimiter, so it is "removed" before the search begins. What should I do to enable indexing and searching for terms containing # as part of the word?

解决方案

As documented under Fine-Tuning MySQL Full-Text Search:

You can change the set of characters that are considered word characters in several ways, as described in the following list. After making the modification, rebuild the indexes for each table that contains any FULLTEXT indexes. Suppose that you want to treat the hyphen character ('-') as a word character. Use one of these methods:

  • Modify the MySQL source: In storage/myisam/ftdefs.h, see the true_word_char() and misc_word_char() macros. Add '-' to one of those macros and recompile MySQL.

  • Modify a character set file: This requires no recompilation. The true_word_char() macro uses a "character type" table to distinguish letters and numbers from other characters. . You can edit the contents of the <ctype><map> array in one of the character set XML files to specify that '-' is a "letter." Then use the given character set for your FULLTEXT indexes. For information about the <ctype><map> array format, see Section 10.3.1, "Character Definition Arrays".

  • Add a new collation for the character set used by the indexed columns, and alter the columns to use that collation. For general information about adding collations, see Section 10.4, "Adding a Collation to a Character Set". For an example specific to full-text indexing, see Section 12.9.7, "Adding a Collation for Full-Text Indexing".

这篇关于MySQL全文搜索hashtags(包括索引中的#符号)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆