在Lucene索引中存储带撇号的单词 [英] Storing words with apostrophe in Lucene index
问题描述
我在Lucene Index中有一个公司领域. 索引中的公司名称之一是:穆迪(Moody's)
I've a company field in Lucene Index. One of the company names indexed is : Moody's
当用户键入以下任何关键字时,我希望该公司出现在搜索结果中. 1.Mo 2.心情 3,喜怒无常 4.穆迪(Moody's)
When user types in any of the following keywords,I want this company to come up in search results. 1.Moo 2.Mood 3.Moodys 4.Moody's
我应该如何在Lucene中存储该索引,以及应该使用哪种类型的Lucene查询来获得这种行为?
How should I store this index in Lucene and what type of Lucene Query should I use to get this behaviour?
谢谢.
推荐答案
根据您的澄清,我想将您的问题一分为二,然后依次回答:
Based on your clarifications, I want to divide your question into two, and answer each in turn:
- 如何将带有撇号的单词索引为等同于没有撇号的相似单词?例如将 Moodys 和 Moody's 映射到相同的索引项.
- 如何在Lucene中实现自动完成搜索-即给定索引,使用单词前缀查找文档,例如将 Moo 映射到 Moodys ?
- How do I index words with apostrophes as equivalent to similar words without an apostrophe? e.g. mapping Moodys and Moody's to the same index term.
- How do I implement auto-complete search in Lucene - i.e. given an index, find documents using word prefixes, e.g. map Moo to Moodys ?
1相对容易-使用 StandardFilter 删除撇号和s.这会将穆迪转换成穆迪. StandardAnalyzer 执行此操作以及更多内容(小写和停止单词删除),可能超出您的需要.使用词干提取器应将 Moodys 和 Moody 都置于同一标记中.尝试 SnowBallFilter 为此.
1 is relatively easy - Use a StandardToeknizer to create a token combining the apostrophe and s with the previous word, then a StandardFilter to remove the apostrophe and s. This will convert Moody's to Moody. A StandardAnalyzer does this and much more (lowercasing and stop word removal), which may be more than you need. Using a stemmer should take both Moodys and Moody to the same token. Try SnowBallFilter for this.
2更难:Lucene的 PrefixQuery <艾伦(Alan)提到的/a>仅在公司名称是字段中的第一个单词时才起作用.您需要类似此自动完成问题的答案Lucene .
2 is harder: Lucene's PrefixQuery, to which Alan alluded, will only work when the company name is the first word in a field. You need something like the answer to this question about auto-complete in Lucene.
这篇关于在Lucene索引中存储带撇号的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!