弹性搜索词接近度 [英] elasticsearch word proximity

查看:216
本文介绍了弹性搜索词接近度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Elasticsearch中,有没有一种方法可以增加文档中查询词彼此靠近的文档的分数?这不仅涉及在一起的单词(可以通过使用带状疱疹解决),而且还涉及相邻的单词之间可能存在另一个不重要的单词.

In elasticsearch is there a way to increase the score of documents where query words are close to each other in the document? It's not only about words that are together, as this could be solved by using shingles, but about words that are in proximity where there might be another unimportant word inbetween.

示例:

文档1:

close words in documents detection

文档2:

close words in detection documents

查询:

close documents

因此,我希望第一个文档的得分更高,而第二个文档的得分更低.

So I'd like to get a higher score for the first document and a lower for the second.

如果这些单词紧挨着,我将使用带状疱疹和两个或三个单词标记.但是,这种方法不能解释彼此接近的单词.

If those words were immediately next to each other, I'd use shingles and two or three words tokens. This approach, however, doesn't account for words close to each others.

推荐答案

以下查询是

The following query is a modified form of that in the elastic docos and should meet the requirements. It uses the proximity feature in ElasticSearch known as "match phrase".

POST /my_index/my_type/_search
{
   "query": {
      "match_phrase": {
         "text": {
            "query": "close documents",
            "slop":  50 
         }
      }
   }
}

上面的slop参数控制术语必须多接近才能使文档被视为完全匹配.从技术上讲,这是必须执行的动作数,因此查询中的单词越多,它就会变得越复杂,但是使用两个术语可以简化距离.除此之外,我们应该在更接近的位置上排名更高.

The slop parameter above controls how close the terms have to be in order for the document to be considered a match at all. Technically this is the number of moves that have to be done so it gets more complex with more words in the query, but with two terms it simplifies to distance. Beyond this, they should rank higher with closer proximity which is what we want.

这篇关于弹性搜索词接近度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆