按小时返回的Elasticsearch聚合返回重复的小时数 [英] Elasticsearch aggregations by hour returning duplicate hours

查看:60
本文介绍了按小时返回的Elasticsearch聚合返回重复的小时数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在ES中执行了以下查询,以根据acctstarttime返回具有更多访问次数的HOURS.

I did the following query in ES to return HOURS with more visits, based on field acctstarttime.

但是,由于我有超过一天的相同小时数,因此返回的相同小时数是原来的2倍,而使用您的COUNT可以预期获得1小时的结果.

But as I have more than one day with the same HOUR it is returned the same HOUR 2x, when it was expected 1 result for HOUR with your COUNT.

查询:

{
    "size" : 0,
    "query" : {
        "filtered" : {
            "query": {
                    "match": { "client_id" : 1 }
            },
            "filter" : {
                "bool" : {
                    "must" : [
                        {
                            "range" : {
                                "acctstarttime" : {
                                    "gte" : "2016-05-01 00:00:00",
                                    "lte" : "2016-06-02 23:59:59"
                                }
                            }
                        }
                    ]
                }
            }
        }
    },
    "aggs" : {
        "visits_per_hour" : {
            "date_histogram" : {
                "field" : "acctstarttime",
                "interval" : "hour",
                "format" : "HH"
            }
        }
    }
}

结果:

"aggregations": {
    "visits_per_hour": {
        "buckets": [
            {
                "key_as_string": "17",
                "key": 1463763600000,
                "doc_count": 6
            },
            {
                "key_as_string": "18",
                "key": 1463767200000,
                "doc_count": 3
            },
            {
                "key_as_string": "22",
                "key": 1464127200000,
                "doc_count": 1
            },
            {
                "key_as_string": "22",
                "key": 1464300000000,
                "doc_count": 2
            },
            {
                "key_as_string": "22",
                "key": 1464559200000,
                "doc_count": 1
            }
        ]
    }
}

预期:

"aggregations": {
    "visits_per_hour": {
        "buckets": [
            {
                "key_as_string": "17",
                "key": 1463763600000,
                "doc_count": 6
            },
            {
                "key_as_string": "18",
                "key": 1463767200000,
                "doc_count": 3
            },
            {
                "key_as_string": "22",
                "key": 1464127200000,
                "doc_count": 4
            }
        ]
    }
}

推荐答案

您有两种解决方案

  1. 您在索引编制时添加另一个字段 hour ,并对该字段进行汇总
  2. 您使用了一些 script 来提取小时并对其进行汇总(注意:您需要
  1. You add another field hour at indexing time and you aggregate on that field
  2. You use a little script that will extract the hour and aggregate on it (note: you need to enable dynamic scripting)

第一个解决方案是首选的解决方案,因为它会提高性能.

The first solution is the preferred one as it will be more performant.

第二种解决方案如下:

{
  "size": 0,
  "aggs": {
    "visits_per_hour": {
      "histogram": {
        "script": "doc.acctstarttime.date.getHourOfDay()",
        "interval": 1,
        "order": {
          "_key": "desc"
        }
      }
    }
  }
}

这篇关于按小时返回的Elasticsearch聚合返回重复的小时数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆