通过文字匹配和距离到点来评分文档 [英] Scoring documents by both textual match and distance to a point

查看:121
本文介绍了通过文字匹配和距离到点来评分文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有商店列表的弹性搜索索引。



我想允许客户通过 geo_distance 搜索这些商店(所以,搜索一点,获取商店附近的商店)和文字比赛,如商店名称/地址的比赛。



我想获得与匹配的结果这两个标准,我希望这些结果的顺序是两者的结合。文本匹配越强,搜索点越近,结果越高。 (显然,将会有一个公式来组合这两个,这将需要调整,而不是太担心那一部分)。



我的问题/ ve尝试:




  • geo_distance 是一个过滤器,而不是查询,所以我不能在查询部分的请求。


  • 我可以使用 bool =>应该过滤(而不是查询)匹配任何一个名称或位置。这给了我想要的结果,但不是按顺序。


  • 我也可以将 _geo_distance as sort 子句的一部分,使文档更靠近点数更高。




我还没有想到,我将如何采取常规 _score ElasticSearch在做文本匹配时给予文档,并将其与 geo_distance 分数相结合。



通过在过滤器中具有文本匹配,它似乎不会影响文档的分数(这是有道理的)。而且我看不到如何在查询部分和 geo_distance过滤器中组合文本匹配,所以它是一个 OR 而不是 AND



我想我最好的赌注就是这样:

  {
function_score:{
query:{...},
functions:[
{geo_distance function},
{multi_match_result score},
],
score_mode:'multiply'
}
}

但我不确定你可以做 geo_distance 作为一个分数函数,我不知道如何使用 multi_match_result分数作为分数函数,或者甚至可能。



任何指针都将不胜感激。



我正在使用ElasticSearch v1.4,但是如果需要,我可以升级。

解决方案


我不确定你可以做geo_distance作为一个分数函数,我不知道如何将multi_match_result分数作为分数函数,或者甚至可能。


你不能真正做到这一点你要问的方式,但你可以轻松地做你想要的。对于更简单的情况,您只需使用正常查询即可获得评分。



过滤器的问题是它们是是/否问题,因此如果您使用它们一个 function_score ,那么它可以提高分数,也可以提高分数。你可能想要的是随着距离原点的距离的增加,分数的退化。这是是/否的本质,阻止他们影响得分。匹配过滤器所暗示的相关性没有任何改善 - 这只是意味着它是答案的一部分,但是说结果应该更接近顶部/底部是没有意义的。



这是



每个值都会影响分数根据图形的衰减(从文档中批发)。如果您使用 0 的偏移量,那么在原点不完全的分数开始下降。使用偏移量,它允许一些缓冲区被认为是一样好。



scale 直接关联贬值,因为一旦贬值,code>贬值 距离原点(+/- 偏移)。在我的上面的例子中,从来源中的任何 5km 将获得一半的分数任何东西在起源



再次注意,不同类型的衰减函数改变了得分的形状。


我希望这些结果的顺序是两者的组合。


这是 bool / 应该复合查询的目的。您可以根据每场比赛获得改进得分的OR行为。将此与上述结合,您将需要以下内容:

  {
查询:{
bool:{
should:[
{
multi_match:{...}
},
{
function_score :{
functions:[
gauss:{
my_geo_point_field:{
origin:0,1,
:5km,
offset:500m,
decay:0.5
}
}
]
}
}
]
}
}
}

注意:如果添加必须,则应该行为从文字类似行为改变(至少为1)必须匹配)完全可选的行为(不能匹配)。


我正在使用ElasticSearch v1.4,但是我可以升级if必要的。


从Elasticsearch 2.0开始,每个过滤器都是一个查询,每个查询也是一个过滤器。唯一的区别是它使用的上下文。这不会改变我的答案,但这是可能会帮助你在未来除了我下一步说的。



ES 2.2 +中的地理相关绩效大幅增加>。您应该升级(并重新创建与地理相关的索引)以利用这些更改。 ES 5.0将具有类似的优势!


I have an ElasticSearch index with a list of "shops".

I'd like to allow customers to search these shops by both geo_distance (so, search for a point and get shops near that location), and textual match, like matches on shop name / address.

I'd like to get results that match either of these two criteria, and I'd like the order of these results to be a combination of both. The stronger the textual match, and the closer to the point searched, the higher the result. (Obviously, there's going to be a formula to combine these two, that'll need tweaking, not too worried about that part yet).

My issue / what I've tried:

  • geo_distance is a filter, not a query, so I can't combine both on the query part of the request.

  • I can use a bool => should filter (rather than query) that matches on either name or location. This gives me the results I want, but not in order.

  • I can also have _geo_distance as part of a sort clause so that documents closer to the point rank higher.

What I haven't figured out is how I would take the "regular" _score that ElasticSearch gives to documents when doing textual matches, and combine that with the geo_distance score.

By having the textual match in the filter, it doesn't seem to affect the score of documents (which makes sense). And I don't see how I could combine the textual match in the query part and a geo_distance filter so it's an OR rather than an AND.

I guess my best bet would be the equivalent of this:

{
  function_score: {
    query: {  ... },
    functions: [
      { geo_distance function },
      { multi_match_result score },
    ],
    score_mode: 'multiply'
  }
}

but I'm not sure you can do geo_distance as a score function, and I don't know how to have multi_match_result score as a score function, or if it's even possible.

Any pointers will be greatly appreciated.

I'm working with ElasticSearch v1.4, but I can upgrade if necessary.

解决方案

but I'm not sure you can do geo_distance as a score function, and I don't know how to have multi_match_result score as a score function, or if it's even possible.

You can't really do it in the way that you're asking, but you can do what you want just as easily. For the simpler case, you get scoring just by using a normal query.

The problem with filters is that they're yes/no questions, so if you use them in a function_score, then it either boosts the score or it doesn't. What you probably want is degradation of the score as the distance from the origin grows. It's the yes/no nature that stops them from impacting the score at all. There's no improvement to relevancy implied by matching a filter -- it just means that it's part of the answer, but it doesn't make sense to say that it should be closer to the top/bottom as a result.

This is where the Decay function score helps. It works with numbers, dates, and -- most helpfully here -- geo_points. In addition to the types of data it accepts, it can decay using either gaussian, exponential, or linear decay functions. The one that you want to choose is honestly arbitrary and you should give the one that chooses the best "experience". I would suggest to start with gauss.

"function_score": {
  "functions": [
    "gauss": {
      "my_geo_point_field": {
        "origin": "0, 1",
        "scale": "5km",
        "offset": "500m",
        "decay": 0.5
      }
    }
  ]
}

Note that origin is in x, y format (due to standard GeoJSON), which is longitude, latitude.

Each one of the values impacts how the score decays based on the graph (taken wholesale from the documentation). If you would use an offset of 0, then the score begins to drop once it's not exactly at the origin. With the offset, it allows it some buffer to be considered just as good.

The scale is directly associated with the decay in that the score will be chopped down by the decay value once it is scale-distance away from the origin (+/- the offset). In my above example, anything 5km from the origin would get half of the score as anything at the origin.

Again, just note that the different types of decay functions change the shape of scoring.

I'd like the order of these results to be a combination of both.

This is the purpose of the bool / should compound query. You get OR behavior with score improvement based on each match. Combining this with the above, you'd want something like:

{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": { ... }
        },
        {
          "function_score": {
            "functions": [
              "gauss": {
                "my_geo_point_field": {
                  "origin": "0, 1",
                  "scale": "5km",
                  "offset": "500m",
                  "decay": 0.5
                }
              }
            ]
          }
        }
      ]
    }
  }
}

NOTE: If you add a must, then the should behavior changes from literal OR-like behavior (at least 1 must match) to completely optional behavior (none must match).

I'm working with ElasticSearch v1.4, but I can upgrade if necessary.

Starting with Elasticsearch 2.0, every filter is a query and every query is also a filter. The only difference is the context that it's used in. This doesn't change my answer here, but it's something that may help you in the future in addition to what I say next.

Geo-related performance increased dramatically in ES 2.2+. You should upgrade (and recreate your geo-related indices) to take advantage of those changes. ES 5.0 will have similar benefits!

这篇关于通过文字匹配和距离到点来评分文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆