通过弹性搜索的时间累积流量 [英] cumulative traffic by time of day with elasticsearch

查看:111
本文介绍了通过弹性搜索的时间累积流量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在接收来自大量客户端应用程序的请求/事件。我想使用弹性搜索找出我最高交通点的时间。

i'm receiving requests/events from a large number of client applications. i'd like to use elasticsearch to find out when my highest traffic point is.

我尝试过的一件事是一个带有嵌套直方图的过滤器聚合,然后嵌套术语聚合,通过脚本字段获得当天的不同时段。以下是我的尝试,它的执行情况非常糟糕(因为我正在执行每个文档的脚本)。

one thing i've tried is a filter aggregation with a nested histogram and then a nested "terms" aggregation that gets the distinct hour of the day via a script field. the following is my attempt, and it performs terribly (as I'd expect since I'm executing a script per document).

{
  "aggs": {
    "sites_within_range": {
      "filter" : { 
        "range" : { 
          "occurred" : { 
            "gt" : "now-1M"
          }
        } 
      },

      "aggs": {
        "sites_over_time": {
          "date_histogram": {
            "field": "occurred",
            "interval": "week"
          },
          "aggs":{
            "site_names": {
              "terms": {
                "script": "doc['occurred'].date.getHourOfDay()",
                "size": 10000
              }
            }
          }
        }
      }

    }
  }
}

我还考虑将要查询的日期元素存储为文档的不同部分,例如:

I've also considered storing the date elements i want to query as distinct parts of the document, eg:

{
    "date": "actual datetime",
    "day": "monday",
    "hour": 8
    "minute": 37
}

也闻到对我错误的答案。

this also smells like the wrong answer to me.

< edit>经过一番调查,看起来我可能会对1.1中新的基数/百分比聚合感兴趣?

<edit> after some investigation, looks like I might be interested in the new cardinality / percents aggregations coming in 1.1?

推荐答案

同样的问题已在此线程

将解决方案解决您的问题,我们需要制作脚本将日期转换为小时:

Adapting the solution to your problem, we need to make a script to convert the date into the hour of day:

Date date = new Date(doc['created_at'].value) ; 
java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('HH');
format.format(date)

在查询中使用它:

{
    "aggs": {
        "perWeekDay": {
            "filter" : { 
                "range" : { 
                    "occurred" : { 
                        "gt" : "now-1M"
                    }
                } 
            },
            "aggs": {
                "terms": {
                    "script": "Date date = new Date(doc['created_at'].value) ;java.text.SimpleDateFormat format = new java.text.SimpleDateFormat('HH');format.format(date)"
            }
        }
    }
}

您的小时通行。

Nota bene:存储您的文档中的小时/天/分钟是执行此类聚合的最有效方法。我的答案假设你不想存储这些信息。脚本通常不是有效的。

Nota bene: Storing the hours/days/minutes in your document is the most efficient way of doing that kind of aggregation. My answer assumes you don't want to store that information. Scripts usually aren't über efficent.

这篇关于通过弹性搜索的时间累积流量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆