弹性搜索:可以处理聚合结果? [英] Elasticsearch: Possible to process aggregation results?
问题描述
我使用SUM-Aggregation计算我的服务进程的持续时间。执行过程的每个步骤都将保存在Elasticsearch的呼叫ID中。
I calculate the duration of my service-processes using the SUM-Aggregation. Each step of the executed process will be saved in Elasticsearch under a calling Id.
这是我监控的:
Duration of Request-Processing for ID #123 (calling service #1)
Duration of Server-Response for ID #123 (calling service #1)
**Complete Duration for ID #123**
Duration of Request-Processing for ID #124 (calling service #1)
Duration of Server-Response for ID #124 (calling service #1)
**Complete duration for ID #124**
过滤器:
{
"from" : 0, "size" :0,
"query" : {
"filtered" : {
"query" : { "match_all" : {}},
"filter" : {
"term" : {
"callingId" : "123",
}
}
}
},
"aggs" : {
"total_duration" : { "sum" : { "field" : "duration" } },
"max_duration":{"max": {"field":"duration"}},
"min_duration":{"min":{"field":"duration"}}
}
}
}
这将返回整个过程的整个持续时间,并告诉我
This returns the complete duration of the process and also tells me which part of the process was the fastest ans which part was the slowest.
下一步我想计算所有完成的流程的平均持续时间,通过服务Id。在这种情况下,我只关心每个服务的总持续时间,所以我可以联系他们。
Next I want to calculate the average duration of all finished processes by serviceId. In this case I only care about the total duration for each service, so I can comepare them.
我如何创建我的平均值,最小值和最大值total_durations?
编辑:我添加了一些示例数据,希望你可以使用它。
I added some sample Data, I hope you can work with it.
Call1:
{
"callerId":"U1",
"operation":"Initialize",
"status":"INITIALIZED",
"duration":1,
"serviceId":"1"
}
{
"callerId":"U1",
"operation":"Calculate",
"status":"STARTED",
"duration":1,
"serviceId":"1"
}
{
"callerId":"U1",
"operation":"Finish",
"status":"FINISHED",
"duration":1200,
"serviceId":"1"
}
sum: 1202
Call2:
{
"callerId":"U2",
"operation":"Initialize",
"status":"INITIALIZED",
"duration":2,
"serviceId":"1"
}
{
"callerId":"U2",
"operation":"Calculate",
"status":"STARTED",
"duration":1,
"serviceId":"1"
}
{
"callerId":"U2",
"operation":"Finish",
"status":"FINISHED",
"duration":1030,
"serviceId":"1"
}
sum: 1033
服务ID#1的所有服务呼叫的汇总
这是我想要计算的:
Aggregation for All Service-Calls for Service ID #1 This is what I want to calculate:
Max: 1202
Min: 1033
AVG: 1116
推荐答案
有点复杂,但这里(只有在1.4,因为这种类型的聚合):
A bit more complicated, but here it goes (only in 1.4 because of this type of aggregation):
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"serviceId": 1
}
}
}
},
"aggs": {
"executionTimes": {
"scripted_metric": {
"init_script": "_agg['values'] = new java.util.HashMap();",
"map_script": "if (_agg.values[doc['callerId'].value]==null) {_agg.values[doc['callerId'].value]=doc['duration'].value;} else {_agg.values[doc['callerId'].value].add(doc['duration'].value);}",
"combine_script":"someHashMap = new java.util.HashMap();for(x in _agg.values.keySet()) {value=_agg.values[x]; sum=0; for(y in value) {sum+=y}; someHashMap.put(x,sum)}; return someHashMap;",
"reduce_script": "finalArray = []; finalMap = new java.util.HashMap(); for(map in _aggs){for(x in map.keySet()){if(finalMap.containsKey(x)){value=finalMap.get(x);finalMap.put(x,value+map.get(x));} else {finalMap.put(x,map.get(x))}}}; finalAvgValue=0; finalMaxValue=-1; finalMinValue=-1; for(key in finalMap.keySet()){currentValue=finalMap.get(key);finalAvgValue+=currentValue; if(finalMinValue<0){finalMinValue=currentValue} else if(finalMinValue>currentValue){finalMinValue=currentValue}; if(currentValue>finalMaxValue) {finalMaxValue=currentValue}}; finalArray.add(finalMaxValue); finalArray.add(finalMinValue); finalArray.add(finalAvgValue/finalMap.size()); return finalArray",
"lang": "groovy"
}
}
}
}
此外,我并不是说这是最好的方法,只有一个我可以找到,而且我不是说解决方案是最好的形式。可能会被清理和改进,但我想显示,这是可能的,但请记住,它可以在1.4中。
Also, I'm not saying it's the best approach, but only one I could find. Also, I'm not saying that the solution is in its best form. Probably, it may be cleaned up and improved. I wanted to show, though, that it is possible. Keep in mind, though, it's available in 1.4.
基本思想方法是使用脚本构建一个数据结构,该结构应该保存所需的信息,根据脚本度量标准聚合。此外,聚合仅对一个 serviceId $ c执行$ c>如果你想为所有的serviceIds这样做,我想你可能想重新考虑一下脚本中的数据结构。
The basic idea of the approach is to use the scripts to build a data structure that should hold the information you need, computed in different steps according to scripted metric aggregation. Also, the aggregation is performed for only one serviceId
. If you want to do this for all serviceIds I think you might want to re-think a bit the data structure in the scripts.
对于上面的查询一个对于您提供的确切数据,输出为:
For the query above and for the exact data you provided the output is this:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0,
"hits": []
},
"aggregations": {
"executionTimes": {
"value": [
1202,
1033,
"1117.5"
]
}
}
}
数组中的值顺序<$根据 reduce_script
中的脚本,c $ c> value 是[max,min,avg]。
The order of values in the array value
is [max, min, avg], as per the script in reduce_script
.
这篇关于弹性搜索:可以处理聚合结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!