Elasticsearch 中的加权随机抽样 [英] Weighted random sampling in Elasticsearch

查看:40
本文介绍了Elasticsearch 中的加权随机抽样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从 ElasticSearch 索引中获取随机样本,即发出一个查询,从给定索引中以加权概率 Wj/ΣWi(其中 Wj> 是行 j 的权重,Wj/ΣWi 是该查询中所有文档的权重之和.

I need to obtain a random sample from an ElasticSearch index, i.e. to issue a query that retrieves some documents from a given index with weighted probability Wj/ΣWi (where Wj is a weight of row j and Wj/ΣWi is a sum of weights of all documents in this query).

目前,我有以下查询:

GET products/_search?pretty=true

{"size":5,
  "query": {
    "function_score": {
      "query": {
        "bool":{
          "must": {
            "term":
              {"category_id": "5df3ab90-6e93-0133-7197-04383561729e"}
          }
        }
      },
      "functions":
        [{"random_score":{}}]
    }
  },
  "sort": [{"_score":{"order":"desc"}}]
}

它从所选类别中随机返回 5 个项目.每个项目都有一个字段weight.所以,我可能不得不使用

It returns 5 items from selected category, randomly. Each item has a field weight. So, I probably have to use

"script_score": {
  "script": "weight = data['weight'].value / SUM; if (_score.doubleValue() > weight) {return 1;} else {return 0;}"
}

此处所述.

我有以下问题:

  • 这样做的正确方法是什么?
  • 我是否需要启用动态编写脚本?
  • 如何计算查询的总和?

非常感谢您的帮助!

推荐答案

如果它对任何人有帮助,以下是我最近实施加权改组的方法.

In case it helps anyone, here is how I recently implemented a weighted shuffling.

在这个例子中,我们对公司进行了洗牌.每家公司都有一个介于 0 到 100 之间的company_score".通过这种简单的加权改组,得分为 100 的公司出现在首页的可能性是得分为 20 的公司的 5 倍.

On this example, we shuffle companies. Each company has a "company_score" between 0 and 100. With this simple weighted shuffling, a company with score 100 is 5 times more likely to appear in first page than a company with score 20.

json_body = {
    "sort": ["_score"],
    "query": {
        "function_score": {
            "query": main_query,  # put your main query here
            "functions": [
                {
                    "random_score": {},
                },
                {
                    "field_value_factor": {
                        "field": "company_score",
                        "modifier": "none",
                        "missing": 0,
                    }
                }
            ],
            # How to combine the result of the two functions 'random_score' and 'field_value_factor'.
            # This way, on average the combined _score of a company having score 100 will be 5 times as much
            # as the combined _score of a company having score 20, and thus will be 5 times more likely
            # to appear on first page.
            "score_mode": "multiply",
            # How to combine the result of function_score with the original _score from the query.
            # We overwrite it as our combined _score (random x company_score) is all we need.
            "boost_mode": "replace",
        }
    }
}

这篇关于Elasticsearch 中的加权随机抽样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆