弹性搜索中的加权随机抽样 [英] Weighted random sampling in Elasticsearch

查看:191
本文介绍了弹性搜索中的加权随机抽样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从ElasticSearch索引中获取一个随机样本,即发出一个查询,它从加权概率 Wj 是行 j Wj /ΣWi的权重目前,我有以下查询:

  GET product / _search?pretty = true 

{size:5,
query:{
function_score
query:{
bool:{
must:{
term:
{category_id:5df3ab90-6e93-0133 7197-04383561729e}
}
}
},
函数:
[{random_score:{}}]
}
},
sort:[{_score:{order:desc}}]
}

从所选类别中随机返回5个项目。
每个项目都有一个字段 weight 。所以,我可能要使用

 script_score:{
script:weight = data ['重量']值/ SUM; if(_score.doubleValue()> weight){return 1;} else {return 0;}
}

此处所述。



我有以下问题:




  • 正确的方法是什么? >
  • 我需要启用动态
    脚本编写

  • 如何计算查询的总和?



非常感谢您的帮助!

解决方案

我知道这个问题是旧的,但回答未来搜索者。



评论是在你的GitHub线程中,你似乎有答案。如果您的每个文档都有相对权重,那么您可以为每个文档选择随机分数,并将其乘以权重以创建新的加权随机分数。这有额外的奖金,不需要权重的总和。



例如。如果两个文件有权重 1 2 ,那么你会期望第二个文件的选择可能性是两倍首先。给每个文档在 0 1之间的随机分数(您已经在使用 random_score)。将随机分数乘以权重,您将拥有第一个文档,分数介于 0 1 之间,第二,得分在 0 2 之间,所以有可能被选中的两倍!


I need to obtain a random sample from an ElasticSearch index, i.e. to issue a query that retrieves some documents from a given index with weighted probability Wj/ΣWi (where Wj is a weight of row j and Wj/ΣWi is a sum of weights of all documents in this query).

Currently, I have the following query:

GET products/_search?pretty=true

{"size":5,
  "query": {
    "function_score": {
      "query": {
        "bool":{
          "must": {
            "term":
              {"category_id": "5df3ab90-6e93-0133-7197-04383561729e"}
          }
        }
      },
      "functions":
        [{"random_score":{}}]
    }
  },
  "sort": [{"_score":{"order":"desc"}}]
}

It returns 5 items from selected category, randomly. Each item has a field weight. So, I probably have to use

"script_score": {
  "script": "weight = data['weight'].value / SUM; if (_score.doubleValue() > weight) {return 1;} else {return 0;}"
}

as described here.

I have the following issues:

  • What is the correct way to do this?
  • Do I need to enable Dynamic Scripting?
  • How to calculate the total sum of the query?

Thanks a lot for your help!

解决方案

I know this question is old, but answering for any future searchers.

The comment before yours in the GitHub thread seems to have the answer. If each of your documents has a relative weight, then you can pick a random score for each document and multiply it by the weight to create your new weighted random score. This has the added bonus of not needing the sum of weights.

e.g. if two documents have weights 1 and 2, then you'd expect the second to have twice the likelihood of selection as the first. Give each document a random score between 0 and 1 (which you're already doing with "random_score"). Multiply the random score by the weight and you'll have the first document with a score between 0 and 1 and the second with a score between 0 and 2, so twice as likely to be selected!

这篇关于弹性搜索中的加权随机抽样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆