如何执行管道聚合,而不返回Elasticsearch中的所有存储桶 [英] How to perform a pipeline aggregation without returning all buckets in Elasticsearch
问题描述
我使用的是Elasticsearch 2.3,我正在尝试使用管道聚合执行两步计算。
我只对我的管道聚合的最终结果感兴趣,但Elasticsearch返回所有的桶信息。
由于我有大量的桶或数亿),这是令人望而生畏的。不幸的是,我找不到一种方式告诉Es不返回所有这些信息。
这是一个玩具示例。我有一个索引 test-index
,文档类型为 obj
。 X- 200454545 X- 20045 X- 20045 X- 20045 X- 20045 X- 20045 X- 20045 X- 20045 X- 20045 X- 20045: p>
curl -XPOST'http://10.10.0.7:9200/test-index/obj'-d'{
value:100,
key:foo
}'
curl -XPOST'http://10.10.0.7:9200/test-index/obj '-d'{
value:20,
key:foo
}'
curl -XPOST'http://10.10。 0.7:9200 / test-index / obj'-d'{
value:50,
key:bar
}'
curl X- 20045454545 X-45454545 X- 20045 X- 20045 X-454545 X-454545 X-454545 X-45454545 X- 20045 CEEC X-
curl -XPOST'http://10.10.0.7:9200/test-index/obj'-d'{
value:70,
key: bar
}'
我想获得平均值(超过所有<$具有相同的 obj
的最小值
的c $ c>键 键
秒。
平均最小值
弹性搜索允许我这样做:
code> curl -XPOST'http://10.10.0.7:9200/test-index/obj/_search'-d'{
size:0,
query:{
match_all:{}
},
聚合:{
key_aggregates:{
terms:{
:key,
size:0
},
aggs:{
min_value:{
min:{
field:value
}
}
}
},
avg_min_value:{
avg_bucket:{
buckets_path:key_aggregates> min_value
}
}
}
}'
但是,这个查询返回每个桶的最小值,尽管我不需要它:
take:21,
timed_out:false,
_shards:{
total:5,
5,
failed:0
},
hits:{
total:4,
max_score:0,
hits:[
]
聚合:{
key_aggregates:{
doc_count_error_upper_bound:0,
sum_other_doc_count:0,
buckets
{
key:bar,
doc_count:2,
min_value:{
value:50
}
},
{
key:foo,
doc_count:2,
min_value:{
value 20
}
}
]
},
avg_min_value:{
value:35
}
}
}
有没有办法摆脱buckets:[...]
?我只对 avg_min_value
感兴趣。
这个玩具示例可能不是一个问题,但是不同的键的数量不大(数十亿或数亿),查询响应是非常大的,我想修剪它。
有没有办法用弹性搜索? X- 20045454545 X- 200 X- 200 200 X- 200 200 X- 200 200 X- 200 200:新新新新旗新新旗新新旗旗哨新新新旗新新旗旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新旗新旗新旗新旗新旗新新旗新新旗新新旗新新旗新新旗新旗新旗新旗新旗新旗新新旗新旗新旗新新旗新新旗新旗新旗新新旗新新旗新旗新旗新新旗新旗新术语中的非负数不可接受,因为它会改变结果。
我有同样的问题,经过相当多的研究,我找到了一个解决方案,并认为我会在这里分享。
你可以使用响应过滤<新新新新旗新新新新旗新新旗新新旗新新200新旗新新旗新新200新旗新新旗旗新1992新新新旗新新旗新200新新旗新新200新新旗新新旗2001-新新新新新旗新新旗2001-新新新新新旗新新旗2001-新新新新新旗新新旗2001-新新新新新旗新新旗2001-新新新新旗新新旗新新旗新新旗新新旗新旗新旗新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新旗新旗新新旗新新旗新新旗新旗新新旗新新旗新新旗新旗新新旗新新旗新新旗新新旗新旗新新旗新新旗新新旗新新旗新旗新新旗新新旗新旗新c> filter_path = aggregations.avg_min_value 到搜索网址。 200新新新新旗新新新新旗新新旗旗新新新新旗新新旗旗新新旗新新旗旗新新旗新新新新旗新新旗新新旗200新新新新旗新新旗200新新新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新test-index / obj / _search?filter_path = aggregations.avg_min_value'-d'{ PS:if你会发现另一个解决方案,你介意在这里分享吗?谢谢! I'm using Elasticsearch 2.3 and I'm trying to perform a two-step computation using a pipeline aggregation.
I'm only interested in the final result of my pipeline aggregation but Elasticsearch returns all the buckets information. Since I have a huge number of buckets (tens or hundreds of millions), this is prohibitive. Unfortunately, I cannot find a way to tell Es not to return all this information. Here is a toy example. I have an index I want to get the average value (over all Elasticsearch allows me to do this: But this query returns the minimum for every bucket, although I don't need it: Is there a way to get rid of all the information inside This might not seem like a problem in this toy example, but when the number of different Is there a way to do this with Elasticsearch? Or am I modelling my data wrong? NB: it is not acceptable to pre-aggregate my data per key, since the NB2: changing I had the same issue and after doing quite a bit of research I found a solution and thought I'd share here. You can use the Response Filtering feature to filter the part of the answer that you want to receive. You should be able to achieve what you want by adding the query parameter PS: if you found another solution would you mind sharing it here? Thanks! 这篇关于如何执行管道聚合,而不返回Elasticsearch中的所有存储桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
size:0,
query:{
match_all:{}
聚合:{
key_aggregates:{
terms:{
field:key,
size 0
},
aggs:{
min_value:{
min:{
field:value
}
}
}
},
avg_min_value:{
avg_bucket:{
buckets_path:key_aggregates> min_value
}
}
}
}'
test-index
with a document type obj
. obj
has two fields, key
and values
.curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
"value": 100,
"key": "foo"
}'
curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
"value": 20,
"key": "foo"
}'
curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
"value": 50,
"key": "bar"
}'
curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
"value": 60,
"key": "bar"
}'
curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
"value": 70,
"key": "bar"
}'
key
s ) of the minimum value
of obj
s having the same key
s.
An average of minima.curl -XPOST 'http://10.10.0.7:9200/test-index/obj/_search' -d '{
"size": 0,
"query": {
"match_all": {}
},
"aggregations": {
"key_aggregates": {
"terms": {
"field": "key",
"size": 0
},
"aggs": {
"min_value": {
"min": {
"field": "value"
}
}
}
},
"avg_min_value": {
"avg_bucket": {
"buckets_path": "key_aggregates>min_value"
}
}
}
}'
{
"took": 21,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": [
]
},
"aggregations": {
"key_aggregates": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "bar",
"doc_count": 2,
"min_value": {
"value": 50
}
},
{
"key": "foo",
"doc_count": 2,
"min_value": {
"value": 20
}
}
]
},
"avg_min_value": {
"value": 35
}
}
}
"buckets": [...]
? I'm only interested in avg_min_value
.key
s is not big (tens or hundreds of millions), the query response is prohibitively large, and I would like to prune it.match_all
part of my query might be replaced by complex and unknown filters.size
to a non-negative number in my terms
aggregation is not acceptable because it would change the result.filter_path=aggregations.avg_min_value
to the search URL. In the example case, it should look similar to this:curl -XPOST 'http://10.10.0.7:9200/test-index/obj/_search?filter_path=aggregations.avg_min_value' -d '{
"size": 0,
"query": {
"match_all": {}
},
"aggregations": {
"key_aggregates": {
"terms": {
"field": "key",
"size": 0
},
"aggs": {
"min_value": {
"min": {
"field": "value"
}
}
}
},
"avg_min_value": {
"avg_bucket": {
"buckets_path": "key_aggregates>min_value"
}
}
}
}'