Lucene同义词过滤器行为 [英] Lucene Synonym Filter behavior
问题描述
我试图弄清lucene的分析仪如何工作? 我的问题是lucene如何处理同义词?情况如下: 我们有一个单词和多个单词
I am trying to figure out how does lucene's analyzer work? My question is how does lucene handle synonym words? Here is the situation: we have single words and multi words
单:foo = bar 多字:foo bar = foobar
single: foo = bar multi words: foo bar = foobar
对于单个单词:
- lucene是否扩展索引记录?我想如果查询中有一个像"foo"这样的词,它也会在查询中添加"bar".我不知道是否发生在索引编制上?
对于多个单词:
- lucene会同时扩展查询和索引编制吗?例如,如果我们有"foo bar",是否会将foobar添加到索引/查询中?
我的第二个问题是:Lucene使用令牌流并将其提供给小写过滤器之类的过滤器.我的问题是lucene如何找到多个单词?像是如何发现"foo bar"是由多个单词组成的?
My second question is : Lucene uses a stream of tokens and gives them to the filters like lowercase filter. My question is how does lucene find the multi words? like how does it find out that "foo bar" is a multi words that are together?
谢谢
推荐答案
SynonymMap.Builder.add()).此行为可能会导致PhraseQueries等问题,请参阅SynonymFilter
文档上的第一个注意.
SynonymFilter can, optionally, keep the original word, and add the synonym to the tokenstream as well, by setting keepOrig
=true (see SynonymMap.Builder.add()). This behavior can cause problems for PhraseQueries and the like, see first Note on the SynonymFilter
docs.
如果您使用相同的Analyzer
进行查询和建立索引,则写入索引的查询和文档当然都将以相同的方式处理.将keepOrig
设置为true的SynonymFilter
是为数不多的通常在查询和索引之间不恰当地应用的Analyzers
之一,但这完全取决于您的实现.
If you are using the same Analyzer
for querying and indexing, then both queries and docs written to the index will, of course, be treated the same way. SynonymFilter
with keepOrig
set to true is one of the few Analyzers
that is reasonably often applied incongruously between querying and indexing, but that is entirely up to your implementation.
As far as how it is implemented, the source code is available to you.
这篇关于Lucene同义词过滤器行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!