弹性搜索:可以处理聚合结果? [英] Elasticsearch: Possible to process aggregation results?

查看:169
本文介绍了弹性搜索:可以处理聚合结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用SUM-Aggregation计算我的服务进程的持续时间。执行过程的每个步骤都将保存在Elasticsearch的呼叫ID中。

I calculate the duration of my service-processes using the SUM-Aggregation. Each step of the executed process will be saved in Elasticsearch under a calling Id.

这是我监控的:

Duration of Request-Processing for ID #123 (calling service #1)

Duration of Server-Response for ID #123 (calling service #1)

**Complete Duration for ID #123**

Duration of Request-Processing for ID #124 (calling service #1)

Duration of Server-Response for ID #124 (calling service #1)

**Complete duration for ID #124**

过滤器:

{
"from" : 0, "size" :0,

    "query" : {
        "filtered" : {
            "query" : { "match_all" : {}},
            "filter" : {
                "term" : { 
                    "callingId" : "123",
                }
            }
        }
    },
    "aggs" : {
        "total_duration" : { "sum" : { "field" : "duration" } },
        "max_duration":{"max": {"field":"duration"}},   
        "min_duration":{"min":{"field":"duration"}}
        }
    }
    }

这将返回整个过程的整个持续时间,并告诉我

This returns the complete duration of the process and also tells me which part of the process was the fastest ans which part was the slowest.

下一步我想计算所有完成的流程的平均持续时间,通过服务Id。在这种情况下,我只关心每个服务的总持续时间,所以我可以联系他们。

Next I want to calculate the average duration of all finished processes by serviceId. In this case I only care about the total duration for each service, so I can comepare them.

我如何创建我的平均值,最小值和最大值total_durations?

编辑:我添加了一些示例数据,希望你可以使用它。

I added some sample Data, I hope you can work with it.

Call1:

{
"callerId":"U1",
"operation":"Initialize",
"status":"INITIALIZED",
"duration":1,
"serviceId":"1"
}

{
"callerId":"U1",
"operation":"Calculate",
"status":"STARTED",
"duration":1,
"serviceId":"1"
}

{
"callerId":"U1",
"operation":"Finish",
"status":"FINISHED",
"duration":1200,
"serviceId":"1"
}

sum: 1202

Call2:

{
"callerId":"U2",
"operation":"Initialize",
"status":"INITIALIZED",
"duration":2,
"serviceId":"1"
}

{
"callerId":"U2",
"operation":"Calculate",
"status":"STARTED",
"duration":1,
"serviceId":"1"
}

{
"callerId":"U2",
"operation":"Finish",
"status":"FINISHED",
"duration":1030,
"serviceId":"1"
}

sum: 1033

服务ID#1的所有服务呼叫的汇总
这是我想要计算的:

Aggregation for All Service-Calls for Service ID #1 This is what I want to calculate:

Max: 1202
Min: 1033
AVG: 1116


推荐答案

有点复杂,但这里(只有在1.4,因为这种类型的聚合):

A bit more complicated, but here it goes (only in 1.4 because of this type of aggregation):

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "serviceId": 1
        }
      }
    }
  },
  "aggs": {
    "executionTimes": {
      "scripted_metric": {
        "init_script": "_agg['values'] = new java.util.HashMap();",
        "map_script": "if (_agg.values[doc['callerId'].value]==null) {_agg.values[doc['callerId'].value]=doc['duration'].value;} else {_agg.values[doc['callerId'].value].add(doc['duration'].value);}",
        "combine_script":"someHashMap = new java.util.HashMap();for(x in _agg.values.keySet()) {value=_agg.values[x]; sum=0; for(y in value) {sum+=y}; someHashMap.put(x,sum)}; return someHashMap;",
        "reduce_script": "finalArray = []; finalMap = new java.util.HashMap(); for(map in _aggs){for(x in map.keySet()){if(finalMap.containsKey(x)){value=finalMap.get(x);finalMap.put(x,value+map.get(x));} else {finalMap.put(x,map.get(x))}}}; finalAvgValue=0; finalMaxValue=-1; finalMinValue=-1; for(key in finalMap.keySet()){currentValue=finalMap.get(key);finalAvgValue+=currentValue; if(finalMinValue<0){finalMinValue=currentValue} else if(finalMinValue>currentValue){finalMinValue=currentValue}; if(currentValue>finalMaxValue) {finalMaxValue=currentValue}}; finalArray.add(finalMaxValue); finalArray.add(finalMinValue); finalArray.add(finalAvgValue/finalMap.size()); return finalArray",
        "lang": "groovy"
      }
    }
  }
}

此外,我并不是说这是最好的方法,只有一个我可以找到,而且我不是说解决方案是最好的形式。可能会被清理和改进,但我想显示,这是可能的,但请记住,它可以在1.4中。

Also, I'm not saying it's the best approach, but only one I could find. Also, I'm not saying that the solution is in its best form. Probably, it may be cleaned up and improved. I wanted to show, though, that it is possible. Keep in mind, though, it's available in 1.4.

基本思想方法是使用脚本构建一个数据结构,该结构应该保存所需的信息,根据脚本度量标准聚合。此外,聚合仅对一个 serviceId 如果你想为所有的serviceIds这样做,我想你可能想重新考虑一下脚本中的数据结构。

The basic idea of the approach is to use the scripts to build a data structure that should hold the information you need, computed in different steps according to scripted metric aggregation. Also, the aggregation is performed for only one serviceId. If you want to do this for all serviceIds I think you might want to re-think a bit the data structure in the scripts.

对于上面的查询一个对于您提供的确切数据,输出为:

For the query above and for the exact data you provided the output is this:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 6,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "executionTimes": {
         "value": [
            1202,
            1033,
            "1117.5"
         ]
      }
   }
}

数组中的值顺序<$根据 reduce_script 中的脚本,c $ c> value 是[max,min,avg]。

The order of values in the array value is [max, min, avg], as per the script in reduce_script.

这篇关于弹性搜索:可以处理聚合结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆