Lucene邻近搜索包含两个以上单词的短语 [英] Lucene Proximity Search for phrase with more than two words

查看:296
本文介绍了Lucene邻近搜索包含两个以上单词的短语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Lucene的手册已经清楚地解释了搜索包含两个单词的短语的含义,例如 http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity Searches

Lucene's manual has explained the meaning of proximity search for a phrase with two words clearly, such as the "jakarta apache"~10 example in http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity Searches

但是,我想知道像"jakarta apache lucene"~10这样的搜索到底能做什么?是否允许相邻的单词最多相隔10个单词,或者所有相邻的单词都可以相隔10个单词?

However, I am wondering what does a search like "jakarta apache lucene"~10 exactly do? Does it allow neighboring words to be at most 10 words apart, or all pairs of words to be that?

谢谢!

推荐答案

倾斜(接近)的作用就像一个编辑距离(请参见

The slop (proximity) works like an edit distance (see PhraseQuery.setSlop). So, the terms could be reordered or have extra terms added. This means that the proximity would be the maximum number of terms added into the whole query. That is:

"jakarta apache lucene"~3

将匹配:

  • 雅加达lucene apache"(距离:2)
  • 这里的雅加达多余的单词是apache lucene"(距离:3)
  • 雅加达,一些单词是用apache分隔的lucene"(距离:3)

但不是:

  • "lucene jakarta apache"(距离:4)
  • 雅加达这里多余的单词apache lucene"(距离:5)
  • 雅加达一些话,阿帕奇进一步分隔了lucene"(距离:4)

一些人被以下问题弄糊涂了:

Some people have been confused by:

"lucene雅加达apache"(距离:4)

"lucene jakarta apache" (distance: 4)

简单的解释是,交换条款需要进行两次修改,因此:

The simple explanation is that swapping terms takes two edits, so:

  1. 雅加达apache lucene(距离:0)
  2. 雅加达lucene apache(第一次交换,距离:2)
  3. lucene jakarta apache(第二次互换,距离:4)

更长但更准确的解释是,每次编辑都允许将一个术语移动一个位置.掉期的第一步是将两个条款相互叠加.牢记这一点解释了为什么可以将三个术语的任何集合重新排列为不大于4的任何顺序.

The longer, but more accurate, explanation is that every edit allows a term to be moved by one position. The first move of a swap transposes two terms on top of each other. Keeping this in mind explains why any set of three terms can be rearranged into any order with distance no greater than 4.

  1. 雅加达apache lucene(距离:0)
  2. 雅加达[apache,lucene](距离:1)
  3. [雅加达,apache,lucene](全部转置在同一位置,距离:2)
  4. lucene [jakarta,apache](距离:3)
  5. lucene jakarta apache(距离:4)

这篇关于Lucene邻近搜索包含两个以上单词的短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆