Lucene Proximity 搜索超过两个词的短语 [英] Lucene Proximity Search for phrase with more than two words

查看:21
本文介绍了Lucene Proximity 搜索超过两个词的短语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Lucene 的手册中已经清楚地解释了邻近搜索的含义,其中包含两个单词,例如 "jakarta apache"~10 中的示例http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity Searches

Lucene's manual has explained the meaning of proximity search for a phrase with two words clearly, such as the "jakarta apache"~10 example in http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity Searches

但是,我想知道像 "jakarta apache lucene"~10 这样的搜索到底是做什么的?它是否允许相邻的单词最多相隔 10 个单词,或者所有成对的单词都是这样?

However, I am wondering what does a search like "jakarta apache lucene"~10 exactly do? Does it allow neighboring words to be at most 10 words apart, or all pairs of words to be that?

谢谢!

推荐答案

slop (proximity) 就像编辑距离一样工作(参见 PhraseQuery.setSlop).因此,这些条款可以重新排序或添加额外的条款.这意味着接近度将是添加到整个查询中的最大术语数.那就是:

The slop (proximity) works like an edit distance (see PhraseQuery.setSlop). So, the terms could be reordered or have extra terms added. This means that the proximity would be the maximum number of terms added into the whole query. That is:

"jakarta apache lucene"~3

将匹配:

  • jakarta lucene apache"(距离:2)
  • "jakarta extra words here apache lucene"(距离:3)
  • jakarta 一些词 apache 分隔 lucene"(距离:3)

但不是:

  • lucene jakarta apache"(距离:4)
  • "jakarta too many extra words here apache lucene"(距离:5)
  • jakarta 一些话apache进一步分隔lucene"(距离:4)

有些人被以下的困惑:

lucene jakarta apache"(距离:4)

"lucene jakarta apache" (distance: 4)

简单的解释是交换术语需要两次编辑,所以:

The simple explanation is that swapping terms takes two edits, so:

  1. jakarta apache lucene(距离:0)
  2. jakarta lucene apache(第一次交换,距离:2)
  3. lucene jakarta apache(第二次交换,距离:4)

更长但更准确的解释是,每次编辑都允许将术语移动一个位置.交换的第一步将两个术语相互交换.牢记这一点解释了为什么任何三个术语的集合都可以重新排列成距离不大于 4 的任何顺序.

The longer, but more accurate, explanation is that every edit allows a term to be moved by one position. The first move of a swap transposes two terms on top of each other. Keeping this in mind explains why any set of three terms can be rearranged into any order with distance no greater than 4.

  1. jakarta apache lucene(距离:0)
  2. jakarta [apache,lucene](距离:1)
  3. [jakarta,apache,lucene](都转置在同一个位置,距离:2)
  4. lucene [jakarta,apache](距离:3)
  5. lucene jakarta apache(距离:4)

这篇关于Lucene Proximity 搜索超过两个词的短语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆