Elasticsearch:可以处理聚合结果吗? [英] Elasticsearch: Possible to process aggregation results?

查看:31
本文介绍了Elasticsearch:可以处理聚合结果吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 SUM-Aggregation 计算我的服务流程的持续时间.执行过程的每一步都将保存在 Elasticsearch 中的调用 Id 下.

I calculate the duration of my service-processes using the SUM-Aggregation. Each step of the executed process will be saved in Elasticsearch under a calling Id.

这是我监控的:

Duration of Request-Processing for ID #123 (calling service #1)

Duration of Server-Response for ID #123 (calling service #1)

**Complete Duration for ID #123**

Duration of Request-Processing for ID #124 (calling service #1)

Duration of Server-Response for ID #124 (calling service #1)

**Complete duration for ID #124**

过滤器:

{
"from" : 0, "size" :0,

    "query" : {
        "filtered" : {
            "query" : { "match_all" : {}},
            "filter" : {
                "term" : { 
                    "callingId" : "123",
                }
            }
        }
    },
    "aggs" : {
        "total_duration" : { "sum" : { "field" : "duration" } },
        "max_duration":{"max": {"field":"duration"}},   
        "min_duration":{"min":{"field":"duration"}}
        }
    }
    }

这将返回整个过程的持续时间,并告诉我过程的哪一部分最快,哪一部分最慢.

This returns the complete duration of the process and also tells me which part of the process was the fastest ans which part was the slowest.

接下来我想通过 serviceId 计算平均所有已完成流程的持续时间.在这种情况下,我只关心每个服务的总持续时间,所以我可以比较它们.

Next I want to calculate the average duration of all finished processes by serviceId. In this case I only care about the total duration for each service, so I can comepare them.

如何根据 total_durations 创建平均值、最小值和最大值?

我添加了一些示例数据,希望您可以使用它.

I added some sample Data, I hope you can work with it.

呼叫 1:

{
"callerId":"U1",
"operation":"Initialize",
"status":"INITIALIZED",
"duration":1,
"serviceId":"1"
}

{
"callerId":"U1",
"operation":"Calculate",
"status":"STARTED",
"duration":1,
"serviceId":"1"
}

{
"callerId":"U1",
"operation":"Finish",
"status":"FINISHED",
"duration":1200,
"serviceId":"1"
}

sum: 1202

呼叫 2:

{
"callerId":"U2",
"operation":"Initialize",
"status":"INITIALIZED",
"duration":2,
"serviceId":"1"
}

{
"callerId":"U2",
"operation":"Calculate",
"status":"STARTED",
"duration":1,
"serviceId":"1"
}

{
"callerId":"U2",
"operation":"Finish",
"status":"FINISHED",
"duration":1030,
"serviceId":"1"
}

sum: 1033

服务 ID #1 的所有服务呼叫的聚合这就是我要计算的:

Max: 1202
Min: 1033
AVG: 1116

推荐答案

有点复杂,但这里是(仅在 1.4 因为 这种类型的聚合):

A bit more complicated, but here it goes (only in 1.4 because of this type of aggregation):

{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "serviceId": 1
        }
      }
    }
  },
  "aggs": {
    "executionTimes": {
      "scripted_metric": {
        "init_script": "_agg['values'] = new java.util.HashMap();",
        "map_script": "if (_agg.values[doc['callerId'].value]==null) {_agg.values[doc['callerId'].value]=doc['duration'].value;} else {_agg.values[doc['callerId'].value].add(doc['duration'].value);}",
        "combine_script":"someHashMap = new java.util.HashMap();for(x in _agg.values.keySet()) {value=_agg.values[x]; sum=0; for(y in value) {sum+=y}; someHashMap.put(x,sum)}; return someHashMap;",
        "reduce_script": "finalArray = []; finalMap = new java.util.HashMap(); for(map in _aggs){for(x in map.keySet()){if(finalMap.containsKey(x)){value=finalMap.get(x);finalMap.put(x,value+map.get(x));} else {finalMap.put(x,map.get(x))}}}; finalAvgValue=0; finalMaxValue=-1; finalMinValue=-1; for(key in finalMap.keySet()){currentValue=finalMap.get(key);finalAvgValue+=currentValue; if(finalMinValue<0){finalMinValue=currentValue} else if(finalMinValue>currentValue){finalMinValue=currentValue}; if(currentValue>finalMaxValue) {finalMaxValue=currentValue}}; finalArray.add(finalMaxValue); finalArray.add(finalMinValue); finalArray.add(finalAvgValue/finalMap.size()); return finalArray",
        "lang": "groovy"
      }
    }
  }
}

另外,我并不是说这是最好的方法,而是我能找到的唯一方法.另外,我并不是说解决方案处于最佳形式.也许,它可能会被清理和改进.不过,我想表明这是可能的.不过请记住,它在 1.4 中可用.

Also, I'm not saying it's the best approach, but only one I could find. Also, I'm not saying that the solution is in its best form. Probably, it may be cleaned up and improved. I wanted to show, though, that it is possible. Keep in mind, though, it's available in 1.4.

该方法的基本思想是使用脚本构建一个数据结构,该结构应该包含您需要的信息,根据 脚本化指标聚合.此外,聚合仅针对一个 serviceId 执行.如果您想对所有 serviceId 执行此操作,我认为您可能需要重新考虑脚本中的数据结构.

The basic idea of the approach is to use the scripts to build a data structure that should hold the information you need, computed in different steps according to scripted metric aggregation. Also, the aggregation is performed for only one serviceId. If you want to do this for all serviceIds I think you might want to re-think a bit the data structure in the scripts.

对于上面的查询以及您提供的确切数据,输出如下:

For the query above and for the exact data you provided the output is this:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 6,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "executionTimes": {
         "value": [
            1202,
            1033,
            "1117.5"
         ]
      }
   }
}

数组 value 中值的顺序是 [max, min, avg],根据 reduce_script 中的脚本.

The order of values in the array value is [max, min, avg], as per the script in reduce_script.

这篇关于Elasticsearch:可以处理聚合结果吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆