对top_hits聚合的总和 [英] Sum over top_hits aggregation

查看:85
本文介绍了对top_hits聚合的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简而言之,问题:如果我对每个存储区的top_hits进行汇总,如何在结果结构中求和特定值?

Question in short: if I have an aggregation for a top_hits per bucket, how do I sum a specific value in the resulting structure?

详细信息:

我有许多记录,每个商店包含一定数量。我想获取每个商店的所有最新记录的总和。

I have a number of records that contain per store a certain quantity. I want to get the sum of all latest record per store.

要获取每个商店的最新记录,我创建以下聚合:

To get the latest record per store, I create the following aggregation:

"latest_quantity_per_store": {
    "aggs": {
        "latest_quantity": {
            "top_hits": {
                "sort": [
                    {
                        "datetime": "desc"
                    },
                    {
                        "quantity": "asc"
                    }
                ],
                "_source": {
                    "includes": [
                        "quantity"
                    ]
                },
                "size": 1
            }
        }
    },
    "terms": {
        "field": "store",
        "size": 10000
    }
}

假设我有两个商店,每个商店有两个不同的数量时间戳记。这是该聚合的结果:

Suppose I have two stores, and two quantities per store for two different timestamps. This is the result of that aggregation:

"latest_quantity_per_store": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
            "key": "01",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "O6wFD2UBG8e7nvSU8dYg",
                            "_score": null,
                            "_source": {
                                "quantity": 6
                            },
                            "sort": [
                                1532476800000,
                                6
                            ]
                        }
                    ]
                }
            }
        },
        {
            "key": "02",
            "doc_count": 2,
            "latest_quantity": {
                "hits": {
                    "total": 2,
                    "max_score": null,
                    "hits": [
                        {
                            "_index": "inventory-local",
                            "_type": "doc",
                            "_id": "pLUFD2UBHBuSGcoH0ZT4",
                            "_score": null,
                            "_source": {
                                "quantity": 11
                            },
                            "sort": [
                                1532476800000,
                                11
                            ]
                        }
                    ]
                }
            }
        }
    ]
}

我现在想在ElasticSearch中进行聚合,以将这些存储区上的总和。在示例数据中,总和超过6和11。我尝试了以下聚合:

I now want to have an aggregation in ElasticSearch that takes the sum over these buckets. In the example data, the sum over 6 and 11. I tried the following aggregation:

"latest_quantity": {
    "sum_bucket": {
        "buckets_path": "latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity"
    }
}

但这会导致此错误:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "No aggregation [hits] found for path [latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity]"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "inventory-local",
        "node": "3z5CqmmAQ-yT2sUCb69DzA",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "No aggregation [hits] found for path [latest_quantity_per_store>latest_quantity>hits>hits>_source>quantity]"
        }
      }
    ]
  },
  "status": 400
}

什么是正确的聚合以某种方式从ElasticSearch中获得数字17?

我对另一聚合做了类似的操作

I did something similar for another aggregation that I had, an average instead of a top_hits aggregation.

"average_quantity": {
    "sum_bucket": {
        "buckets_path": "average_quantity_per_store>average_quantity"
    }
},
"average_quantity_per_store": {
    "aggs": {
        "average_quantity": {
            "avg": {
                "field": "quantity"
            }
        }
    },
    "terms": {
        "field": "store",
        "size": 10000
    }
}

这可以按预期工作,结果如下:

This works as expected, this is the result:

"average_quantity_per_store": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
        {
            "key": "01",
            "doc_count": 2,
            "average_quantity": {
                "value": 6
            }
        },
        {
            "key": "02",
            "doc_count": 2,
            "average_quantity": {
                "value": 11.5
            }
        }
    ]
},
"average_quantity": {
    "value": 17.5
}


推荐答案

有一种方法可以使用 scripted_metric 聚合和< a href = https://www.elastic.co/guide/zh-CN/elasticsearch/reference/current/search-aggregations-pipeline-sum-bucket-aggregation.html rel = nofollow noreferrer> sum_bucket 管道聚合。脚本化的指标聚合有点复杂,但主要思想是允许您提供自己的存储算法并从中吐出一个指标指标。

There's a way to solve this using a mix of scripted_metric aggregation and sum_bucket pipeline aggregation. The scripted metric aggregation is a bit complex, but to the main idea is to allow you to provide your own bucketing algorithm and spit out a single metric figure out of it.

在您的情况下,您要做的是找出每个商店的最新数量,然后对这些商店数量求和。解决方案如下所示,我将在下面解释一些详细信息:

In your case, what you want to do is to figure out the latest quantity for each store and then sum those store quantities. The solution looks like this, I'll explain some details below:

POST inventory-local/_search
{
  "size": 0,
  "aggs": {
    "bystore": {
      "terms": {
        "field": "store.keyword",
        "size": 10000
      },
      "aggs": {
        "latest_quantity": {
          "scripted_metric": {
            "init_script": "params._agg.quantities = new TreeMap()",
            "map_script": "params._agg.quantities.put(doc.datetime.date, [doc.datetime.date.millis, doc.quantity.value])",
            "combine_script": "return params._agg.quantities.lastEntry().getValue()",
            "reduce_script": "def maxkey = 0; def qty = 0; for (a in params._aggs) {def currentKey = a[0]; if (currentKey > maxkey) {maxkey = currentKey; qty = a[1]} } return qty;"
          }
        }
      }
    },
    "sum_latest_quantities": {
      "sum_bucket": {
        "buckets_path": "bystore>latest_quantity.value"
      }
    }
  }
}

请注意,要使其正常运行,您需要在 elasticsearch.yml <中设置 script.painless.regex.enabled:true / code>配置文件。

Note that in order for this to work, you need to set script.painless.regex.enabled: true in your elasticsearch.yml configuration file.

init_script 创建 TreeMap 每个分片。
map_script 使用日期/数量的映射在每个分片上填充 TreeMap 。我们在地图中输入的值在单个字符串中包含时间戳和数量。我们稍后将在 reduce_script 中使用该时间戳。
combine_script 只是获取 TreeMap 的最后一个值,因为这是给定分片的最新数量。
大部分工作位于 reduce_script 中。我们迭代每个分片的所有最新数量,并返回最新的数量。

The init_script creates a TreeMap for each shard. The map_script populates the TreeMap on each shard with mappings of date/quantities. The value that we put in the map contains the timestamp and the quantity in a single string. We'll need that timestamp later in the reduce_script. The combine_script simply takes the last value of the TreeMap since this is the latest quantity for the given shard. The bulk of the work is located in the reduce_script. We iterate all latest quantities for each shard and return the latest one.

这时,我们具有每个商店的最新数量。剩下要做的就是使用 sum_bucket 管道聚合来求和每个商店数量。结果为17。

At this point, we have the latest quantity for each store. All what remains to be done is to use a sum_bucket pipeline aggregation in order to sum each store quantity. And there you have the result of 17.

响应如下:

 "aggregations": {
    "bystore": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "01",
          "doc_count": 2,
          "latest_quantity": {
            "value": 6
          }
        },
        {
          "key": "02",
          "doc_count": 2,
          "latest_quantity": {
            "value": 11
          }
        }
      ]
    },
    "sum_latest_quantities": {
      "value": 17
    }
  }

这篇关于对top_hits聚合的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆