如何在Elasticsearch的date_histogram选择器中选择最后一个存储桶 [英] How to select the last bucket in a date_histogram selector in Elasticsearch

查看:102
本文介绍了如何在Elasticsearch的date_histogram选择器中选择最后一个存储桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 date_histogram ,我可以使用 max_bucket 来获取具有最大价值的存储桶,但是我想选择最后一个存储桶(即具有最高存储桶的存储桶)时间戳).

I have a date_histogram and I can use max_bucket to get the bucket with the greatest value, but I want to select the last bucket (i.e. the bucket with the highest timestamp).

使用 max_bucket 获得最大的值可以,但是我不知道在 buckets_path 中放入什么来获取最后一个存储桶.

Using max_bucket to get the greatest value works OK, but I don't know what to put in the buckets_path to get the last bucket.

我的映射:

{
  "ee-2020-02-28" : {
    "mappings" : {
      "dynamic" : "strict",
      "properties" : {
        "date" : {
          "type" : "date"
        },
        "frequency" : {
          "type" : "long"
        },
        "keyword" : {
          "type" : "keyword"
        },
        "text" : {
          "type" : "text"
        }
      }
    }
  }
}

我的工作查询,该查询以较高的频率返回当天的时段(之所以命名为 last_day ,因为这是达到我的目标的WIP查询):

My working query, which returns the bucket for the day with higher frequency (it's named last_day because this is a WIP query to get to my goal):

{
    "query": {
        "range": {
            "date": { /* Start away from the begining of data, so the rolling avg is full */
                "gte": "2019-02-18"/*,
                "lte": "2020-12-14"*/
            }
        }
    },
    "aggs": {
        "palabrejas": {
            "terms": {
                "field": "keyword",
                "size": 100
            },
            "aggs": {
                "nnndiario": {
                    "date_histogram": {
                        "field": "date",
                        "calendar_interval": "day"
                    },
                    "aggs": {
                        "dailyfreq": {
                            "sum": {
                                "field": "frequency"
                            }
                        }
                    }
                },
                "ventanuco": {
                    "avg_bucket": {
                        "buckets_path": "nnndiario>dailyfreq",
                        "gap_policy": "insert_zeros"
                    }
                },
                "last_day": {
                    "max_bucket": {
                        "buckets_path": "nnndiario>dailyfreq"
                    }
                }
            }
        }
    }
}

它的输出(注意,我用 [...] 替换了很长的部分):

Its output (notice I replaced long parts with [...]):

{
  "aggregations" : {
    "palabrejas" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "rama0",
          "doc_count" : 20400,
          "nnndiario" : {
            "buckets" : [
              {
                "key_as_string" : "2020-01-01T00:00:00.000Z",
                "key" : 1577836800000,
                "doc_count" : 600,
                "dailyfreq" : {
                  "value" : 3000.0
                }
              },
              {
                "key_as_string" : "2020-01-02T00:00:00.000Z",
                "key" : 1577923200000,
                "doc_count" : 600,
                "dailyfreq" : {
                  "value" : 3000.0
                }
              },
              {
                "key_as_string" : "2020-01-03T00:00:00.000Z",
                "key" : 1578009600000,
                "doc_count" : 600,
                "dailyfreq" : {
                  "value" : 3000.0
                }
              },
              [...]
              {
                "key_as_string" : "2020-01-31T00:00:00.000Z",
                "key" : 1580428800000,
                "doc_count" : 600,
                "dailyfreq" : {
                  "value" : 3000.0
                }
              }
            ]
          },
          "ventanuco" : {
            "value" : 3290.3225806451615
          },
          "last_day" : {
            "value" : 12000.0,
            "keys" : [
              "2020-01-13T00:00:00.000Z"
            ]
          }
        },
        {
          "key" : "rama1",
          "doc_count" : 20400,
          "nnndiario" : {
            "buckets" : [
              {
                "key_as_string" : "2020-01-01T00:00:00.000Z",
                "key" : 1577836800000,
                "doc_count" : 600,
                "dailyfreq" : {
                  "value" : 3000.0
                }
              },
              [...]
            ]
          },
          "ventanuco" : {
            "value" : 3290.3225806451615
          },
          "last_day" : {
            "value" : 12000.0,
            "keys" : [
              "2020-01-13T00:00:00.000Z"
            ]
          }
        },
        [...]
        }
      ]
    }
  }
}

我不知道要在 last_day buckets_path 中放入什么以获取最后一个存储桶.

I don't know what to put in last_day's buckets_path to obtain the last bucket.

推荐答案

您可以考虑使用 terms 聚合而不是 date_histogram -aggregation:

You might consider using a terms aggregation instead of a date_histogram-aggregation:

"max_date_bucket_agg": {
  "terms": {
    "field": "date",
    "size": 1, 
    "order": {"_key": "desc"} 
  }
}

一个问题可能是数据的粒度,您可以考虑将预期粒度(例如天)的日期值存储在单独的字段中,并在 terms -aggregation中使用该字段.

An issue might be the granularity of your data, you may consider storing the date-value of the expected granularity (e.g. day) in a separate field and use that field in the terms-aggregation.

这篇关于如何在Elasticsearch的date_histogram选择器中选择最后一个存储桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆