如何在Elasticsearch的date_histogram选择器中选择最后一个存储桶 [英] How to select the last bucket in a date_histogram selector in Elasticsearch
问题描述
我有一个 date_histogram
,我可以使用 max_bucket
来获取具有最大价值的存储桶,但是我想选择最后一个存储桶(即具有最高存储桶的存储桶)时间戳).
I have a date_histogram
and I can use max_bucket
to get the bucket with the greatest value, but I want to select the last bucket (i.e. the bucket with the highest timestamp).
使用 max_bucket
获得最大的值可以,但是我不知道在 buckets_path
中放入什么来获取最后一个存储桶.
Using max_bucket
to get the greatest value works OK, but I don't know what to put in the buckets_path
to get the last bucket.
我的映射:
{
"ee-2020-02-28" : {
"mappings" : {
"dynamic" : "strict",
"properties" : {
"date" : {
"type" : "date"
},
"frequency" : {
"type" : "long"
},
"keyword" : {
"type" : "keyword"
},
"text" : {
"type" : "text"
}
}
}
}
}
我的工作查询,该查询以较高的频率返回当天的时段(之所以命名为 last_day
,因为这是达到我的目标的WIP查询):
My working query, which returns the bucket for the day with higher frequency (it's named last_day
because this is a WIP query to get to my goal):
{
"query": {
"range": {
"date": { /* Start away from the begining of data, so the rolling avg is full */
"gte": "2019-02-18"/*,
"lte": "2020-12-14"*/
}
}
},
"aggs": {
"palabrejas": {
"terms": {
"field": "keyword",
"size": 100
},
"aggs": {
"nnndiario": {
"date_histogram": {
"field": "date",
"calendar_interval": "day"
},
"aggs": {
"dailyfreq": {
"sum": {
"field": "frequency"
}
}
}
},
"ventanuco": {
"avg_bucket": {
"buckets_path": "nnndiario>dailyfreq",
"gap_policy": "insert_zeros"
}
},
"last_day": {
"max_bucket": {
"buckets_path": "nnndiario>dailyfreq"
}
}
}
}
}
}
它的输出(注意,我用 [...]
替换了很长的部分):
Its output (notice I replaced long parts with [...]
):
{
"aggregations" : {
"palabrejas" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "rama0",
"doc_count" : 20400,
"nnndiario" : {
"buckets" : [
{
"key_as_string" : "2020-01-01T00:00:00.000Z",
"key" : 1577836800000,
"doc_count" : 600,
"dailyfreq" : {
"value" : 3000.0
}
},
{
"key_as_string" : "2020-01-02T00:00:00.000Z",
"key" : 1577923200000,
"doc_count" : 600,
"dailyfreq" : {
"value" : 3000.0
}
},
{
"key_as_string" : "2020-01-03T00:00:00.000Z",
"key" : 1578009600000,
"doc_count" : 600,
"dailyfreq" : {
"value" : 3000.0
}
},
[...]
{
"key_as_string" : "2020-01-31T00:00:00.000Z",
"key" : 1580428800000,
"doc_count" : 600,
"dailyfreq" : {
"value" : 3000.0
}
}
]
},
"ventanuco" : {
"value" : 3290.3225806451615
},
"last_day" : {
"value" : 12000.0,
"keys" : [
"2020-01-13T00:00:00.000Z"
]
}
},
{
"key" : "rama1",
"doc_count" : 20400,
"nnndiario" : {
"buckets" : [
{
"key_as_string" : "2020-01-01T00:00:00.000Z",
"key" : 1577836800000,
"doc_count" : 600,
"dailyfreq" : {
"value" : 3000.0
}
},
[...]
]
},
"ventanuco" : {
"value" : 3290.3225806451615
},
"last_day" : {
"value" : 12000.0,
"keys" : [
"2020-01-13T00:00:00.000Z"
]
}
},
[...]
}
]
}
}
}
我不知道要在 last_day
的 buckets_path
中放入什么以获取最后一个存储桶.
I don't know what to put in last_day
's buckets_path
to obtain the last bucket.
推荐答案
您可以考虑使用 terms
聚合而不是 date_histogram
-aggregation:
You might consider using a terms
aggregation instead of a date_histogram
-aggregation:
"max_date_bucket_agg": {
"terms": {
"field": "date",
"size": 1,
"order": {"_key": "desc"}
}
}
一个问题可能是数据的粒度,您可以考虑将预期粒度(例如天)的日期值存储在单独的字段中,并在 terms
-aggregation中使用该字段.
An issue might be the granularity of your data, you may consider storing the date-value of the expected granularity (e.g. day) in a separate field and use that field in the terms
-aggregation.
这篇关于如何在Elasticsearch的date_histogram选择器中选择最后一个存储桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!