如何执行管道聚合,而不返回Elasticsearch中的所有存储桶 [英] How to perform a pipeline aggregation without returning all buckets in Elasticsearch

查看:206
本文介绍了如何执行管道聚合,而不返回Elasticsearch中的所有存储桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是Elasticsearch 2.3,我正在尝试使用管道聚合执行两步计算。
我只对我的管道聚合的最终结果感兴趣,但Elasticsearch返回所有的桶信息。



由于我有大量的桶或数亿),这是令人望而生畏的。不幸的是,我找不到一种方式告诉Es不返回所有这些信息。



这是一个玩具示例。我有一个索引 test-index ,文档类型为 obj 。 X- 200454545 X- 20045 X- 20045 X- 20045 X- 20045 X- 20045 X- 20045 X- 20045 X- 20045 X- 20045: p>

  curl -XPOST'http://10.10.0.7:9200/test-index/obj'-d'{
value:100,
key:foo
}'

curl -XPOST'http://10.10.0.7:9200/test-index/obj '-d'{
value:20,
key:foo
}'

curl -XPOST'http://10.10。 0.7:9200 / test-index / obj'-d'{
value:50,
key:bar
}'

curl X- 20045454545 X-45454545 X- 20045 X- 20045 X-454545 X-454545 X-454545 X-45454545 X- 20045 CEEC X-

curl -XPOST'http://10.10.0.7:9200/test-index/obj'-d'{
value:70,
key: bar
}'

我想获得平均值(超过所有<$具有相同的 obj 的最小的c $ c>键 秒。
平均最小值



弹性搜索允许我这样做:

 code> curl -XPOST'http://10.10.0.7:9200/test-index/obj/_search'-d'{
size:0,
query:{
match_all:{}
},
聚合:{
key_aggregates:{
terms:{
:key,
size:0
},
aggs:{
min_value:{
min:{
field:value
}
}
}
},
avg_min_value:{
avg_bucket:{
buckets_path:key_aggregates> min_value
}
}
}
}'

但是,这个查询返回每个桶的最小值,尽管我不需要它:

 take:21,
timed_out:false,
_shards:{
total:5,
5,
failed:0
},
hits:{
total:4,
max_score:0,
hits:[

]

聚合:{
key_aggregates:{
doc_count_error_upper_bound:0,
sum_other_doc_count:0,
buckets
{
key:bar,
doc_count:2,
min_value:{
value:50
}
},
{
key:foo,
doc_count:2,
min_value:{
value 20
}
}
]
},
avg_min_value:{
value:35
}
}
}

有没有办法摆脱buckets:[...] ?我只对 avg_min_value 感兴趣。



这个玩具示例可能不是一个问题,但是不同的键的数量不大(数十亿或数亿),查询响应是非常大的,我想修剪它。



有没有办法用弹性搜索? X- 20045454545 X- 200 X- 200 200 X- 200 200 X- 200 200 X- 200 200:新新新新旗新新旗新新旗旗哨新新新旗新新旗旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新旗新旗新旗新旗新旗新新旗新新旗新新旗新新旗新新旗新旗新旗新旗新旗新旗新新旗新旗新旗新新旗新新旗新旗新旗新新旗新新旗新旗新旗新新旗新旗新术语中的非负数不可接受,因为它会改变结果。

解决方案

我有同样的问题,经过相当多的研究,我找到了一个解决方案,并认为我会在这里分享。



你可以使用响应过滤<新新新新旗新新新新旗新新旗新新旗新新200新旗新新旗新新200新旗新新旗旗新1992新新新旗新新旗新200新新旗新新200新新旗新新旗2001-新新新新新旗新新旗2001-新新新新新旗新新旗2001-新新新新新旗新新旗2001-新新新新新旗新新旗2001-新新新新旗新新旗新新旗新新旗新新旗新旗新旗新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新旗新旗新新旗新新旗新新旗新旗新新旗新新旗新新旗新旗新新旗新新旗新新旗新新旗新旗新新旗新新旗新新旗新新旗新旗新新旗新新旗新旗新c> filter_path = aggregations.avg_min_value 到搜索网址。 200新新新新旗新新新新旗新新旗旗新新新新旗新新旗旗新新旗新新旗旗新新旗新新新新旗新新旗新新旗200新新新新旗新新旗200新新新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新新旗新test-index / obj / _search?filter_path = aggregations.avg_min_value'-d'{
size:0,
query:{
match_all:{}

聚合:{
key_aggregates:{
terms:{
field:key,
size 0
},
aggs:{
min_value:{
min:{
field:value
}
}
}
},
avg_min_value:{
avg_bucket:{
buckets_path:key_aggregates> min_value
}
}
}
}'

PS:if你会发现另一个解决方案,你介意在这里分享吗?谢谢!


I'm using Elasticsearch 2.3 and I'm trying to perform a two-step computation using a pipeline aggregation. I'm only interested in the final result of my pipeline aggregation but Elasticsearch returns all the buckets information.

Since I have a huge number of buckets (tens or hundreds of millions), this is prohibitive. Unfortunately, I cannot find a way to tell Es not to return all this information.

Here is a toy example. I have an index test-index with a document type obj. obj has two fields, key and values.

curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
  "value": 100,
  "key": "foo"
}'

curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
  "value": 20,
  "key": "foo"
}'

curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
  "value": 50,
  "key": "bar"
}'

curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
  "value": 60,
  "key": "bar"
}'

curl -XPOST 'http://10.10.0.7:9200/test-index/obj' -d '{
  "value": 70,
  "key": "bar"
}'

I want to get the average value (over all keys ) of the minimum value of objs having the same keys. An average of minima.

Elasticsearch allows me to do this:

curl -XPOST 'http://10.10.0.7:9200/test-index/obj/_search' -d '{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "key_aggregates": {
      "terms": {
        "field": "key",
        "size": 0
      },
      "aggs": {
        "min_value": {
          "min": {
            "field": "value"
          }
        }
      }
    },
    "avg_min_value": {
      "avg_bucket": {
        "buckets_path": "key_aggregates>min_value"
      }
    }
  }
}'

But this query returns the minimum for every bucket, although I don't need it:

{
  "took": 21,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0,
    "hits": [

    ]
  },
  "aggregations": {
    "key_aggregates": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "bar",
          "doc_count": 2,
          "min_value": {
            "value": 50
          }
        },
        {
          "key": "foo",
          "doc_count": 2,
          "min_value": {
            "value": 20
          }
        }
      ]
    },
    "avg_min_value": {
      "value": 35
    }
  }
}

Is there a way to get rid of all the information inside "buckets": [...]? I'm only interested in avg_min_value.

This might not seem like a problem in this toy example, but when the number of different keys is not big (tens or hundreds of millions), the query response is prohibitively large, and I would like to prune it.

Is there a way to do this with Elasticsearch? Or am I modelling my data wrong?

NB: it is not acceptable to pre-aggregate my data per key, since the match_all part of my query might be replaced by complex and unknown filters.

NB2: changing size to a non-negative number in my terms aggregation is not acceptable because it would change the result.

解决方案

I had the same issue and after doing quite a bit of research I found a solution and thought I'd share here.

You can use the Response Filtering feature to filter the part of the answer that you want to receive.

You should be able to achieve what you want by adding the query parameter filter_path=aggregations.avg_min_value to the search URL. In the example case, it should look similar to this:

curl -XPOST 'http://10.10.0.7:9200/test-index/obj/_search?filter_path=aggregations.avg_min_value' -d '{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggregations": {
    "key_aggregates": {
      "terms": {
        "field": "key",
        "size": 0
      },
      "aggs": {
        "min_value": {
          "min": {
            "field": "value"
          }
        }
      }
    },
    "avg_min_value": {
      "avg_bucket": {
        "buckets_path": "key_aggregates>min_value"
      }
    }
  }
}'

PS: if you found another solution would you mind sharing it here? Thanks!

这篇关于如何执行管道聚合,而不返回Elasticsearch中的所有存储桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆