Lucene同义词过滤器行为 [英] Lucene Synonym Filter behavior

查看:89
本文介绍了Lucene同义词过滤器行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图弄清lucene的分析仪如何工作? 我的问题是lucene如何处理同义词?情况如下: 我们有一个单词和多个单词

I am trying to figure out how does lucene's analyzer work? My question is how does lucene handle synonym words? Here is the situation: we have single words and multi words

单:foo = bar 多字:foo bar = foobar

single: foo = bar multi words: foo bar = foobar

对于单个单词:

  • lucene是否扩展索引记录?我想如果查询中有一个像"foo"这样的词,它也会在查询中添加"bar".我不知道是否发生在索引编制上?

对于多个单词:

  • lucene会同时扩展查询和索引编制吗?例如,如果我们有"foo bar",是否会将foobar添加到索引/查询中?

我的第二个问题是:Lucene使用令牌流并将其提供给小写过滤器之类的过滤器.我的问题是lucene如何找到多个单词?像是如何发现"foo bar"是由多个单词组成的?

My second question is : Lucene uses a stream of tokens and gives them to the filters like lowercase filter. My question is how does lucene find the multi words? like how does it find out that "foo bar" is a multi words that are together?

谢谢

推荐答案

SynonymFilter can, optionally, keep the original word, and add the synonym to the tokenstream as well, by setting keepOrig=true (see SynonymMap.Builder.add()). This behavior can cause problems for PhraseQueries and the like, see first Note on the SynonymFilter docs.

如果您使用相同的Analyzer进行查询和建立索引,则写入索引的查询和文档当然都将以相同的方式处理.将keepOrig设置为true的SynonymFilter是为数不多的通常在查询和索引之间不恰当地应用的Analyzers之一,但这完全取决于您的实现.

If you are using the same Analyzer for querying and indexing, then both queries and docs written to the index will, of course, be treated the same way. SynonymFilter with keepOrig set to true is one of the few Analyzers that is reasonably often applied incongruously between querying and indexing, but that is entirely up to your implementation.

关于其实现方式,

As far as how it is implemented, the source code is available to you.

这篇关于Lucene同义词过滤器行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆