弹性蛋白渗透液反应评分 [英] Scoring in elasticsearch percolate-response

查看:133
本文介绍了弹性蛋白渗透液反应评分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用弹性搜索的渗透特征。它工作得很好 - 我得到匹配的percolate-ids返回一个新的文档,并可以基本上构建一个反向搜索。

I am using the percolate feature of elasticsearch. It works all well - I get the matching percolate-ids back for a new document and can build basically an inverse search. Up until now all great.

出现问题:我想要一个表达如何那么给定的文件可以匹配一个渗滤器的查询(正常查询给我的分数)。为此,我添加了 track_scores ,但没有运气。

Here comes the problem: I want to have a score expressing how well the given document matches the query of a percolator (exactly the score a normal query gives me). To do this I added the track_scores, but got no luck.

我在文档中发现了 track_scores


...分数基于查询,并表示查询的方式匹配到渗出查询的元数据,而不是文档被如何匹配到查询...

...The score is based on the query and represents how the query matched to the percolate query’s metadata and not how the document being percolated matched to the query...

是我想要/甚至需要甚至可能?

Is what I want/need even possible?

这里有一个演示该问题的示例(取自 elasticsearch.org )。在这里,渗透反应中返回的分数始终为 1.0 ,无论输入文件如何:

Here a sample demonstrating the problem (taken from elasticsearch.org). Here the score returned in the percolate-response is always 1.0, regardless of the input document:

//Index the percolator
curl -XPUT 'localhost:9200/my-index/.percolator/1' -d '{
    "query" : {
        "match" : {
            "message" : "bonsai tree"
        }
    }
}'

渗滤液第一份文件:

curl -XGET 'localhost:9200/my-index/message/_percolate' -d '{
    "doc" : {
        "message" : "A new bonsai tree in the office"
    },
    "track_scores" : "true"
}'


//...returns
{"took": 1, "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
}, "total": 1, "matches": [
    {
        "_index": "my-index",
        "_id": "1",
        "_score": 1.0 <-- Score
    }
]}

渗透一次(不同)o ne:

Percolate a second (different) one:

//Percolate a second one
curl -XGET 'localhost:9200/my-index/message/_percolate' -d '{
    "doc" : {
        "message" : "A new bonsai tree in the office next to another bonsai tree is cool!"
    },
     "track_scores" : "true"
}'


//...returns
{"took": 3, "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
}, "total": 1, "matches": [
    {
        "_index": "my-index",
        "_id": "1",
        "_score": 1.0 <-- SAME Score, but different document (other score needed here!)
    }
]}



我需要什么



我想要为第一个文档(例如 0.8 code> 0.9 为第二个。但是他们不能像他们这样做的得分相同。如何实现我想要的?

What would I need

I want to have a score of something like 0.8 for the first document and something like 0.9 for the second one. But they can not have the same score like they did here. How can I achieve what I want?

非常感谢任何想法和帮助。

Thanks a lot for any idea and help.

推荐答案

分数与数据集中的其他文档相关。您可以进行某种自定义评分,您只关注手头的文档的术语频率/逆文档频率,但可能不会非常有效,但可能足够好。

Score is relative to other documents in the data set. You could potentially do some sort of custom scoring where you only focus on term frequency/inverse document frequency of the document on hand, but probably won't be terribly effective, but might be good enough.

我不知道这是否是您的问题的可行解决方案,但是一种方法将重新运行与整个数据集相匹配的所有匹配的渗透性查询,并抓取您的文档从那里得分,并用该数据重新索引文档。由于它是相对的,所以这可能需要您更新与查询匹配的所有其他文档。可能的话,最好是以一定的间隔进行全球重新评分。

I am not not sure if this is a viable solution for your problem, but one approach would be re-run all matching percolate queries against the whole dataset and grab your docs score from a that and re-index the document with that data. Since it is all relative, this would potentially require you to then update all the other documents matching the query. Likely, it would be best to do the global re-score at some set interval.

这篇关于弹性蛋白渗透液反应评分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆