通过存储桶键值过滤Elasticsearch聚合 [英] Filter Elasticsearch Aggregation by Bucket Key Value
问题描述
我有一个文档的Elasticsearch索引,其中有一个包含URL列表的字段.像预期的那样,在该字段上进行汇总可以使我获得唯一URL的数量.
I have an Elasticsearch index of documents in which there is a field that contains a list of URLs. Aggregating on this field gives me the count of unique URLs, as expected.
GET models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"size": 10
}
}
}
}
I then want to filter out the buckets whose keys do not contain a certain string. I've tried doing so with the Bucket Selector Aggregation.
此尝试:
GET models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"size": 10
}
},
"links_key_filter": {
"bucket_selector": {
"buckets_path": {
"key": "links"
},
"script": "!key.contains('foo')"
}
}
}
}
失败:
类型为[links_key_filter]的无效管道聚合[bucket_selector].在以下位置仅允许同级管道聚合顶层
Invalid pipeline aggregation named [links_key_filter] of type [bucket_selector]. Only sibling pipeline aggregations are allowed at the top level
将存储桶选择器放入链接聚合中,如下所示:
Putting the bucket selector inside the links aggregation, like so:
GET models*/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"links": {
"terms": {
"field": "links.keyword",
"size": 10
},
"bucket_selector": {
"buckets_path": {
"key": "links"
},
"script": "!key.contains('foo')"
}
}
}
}
失败:
在[链接]中找到了两个聚合类型定义:[条款]和[bucket_selector]
Found two aggregation type definitions in [links]: [terms] and [bucket_selector]
我将继续修补,但此刻有些卡住:(
I'm going to keep tinkering but am a bit stuck at the moment :(
推荐答案
您无法使用 bucket_selector
,因为它的 bucket_path
>
必须引用数字值或单值数字度量聚合以及
terms
聚合产生的结果表示为StringTerms
-不管您是否and what a
terms
aggregation produces is denoted asStringTerms
— and that simply won't work, regardless of whether you force a placeholder multibucket aggregation or not.假设您的链接是关键字数组:
Assuming that your links are arrays of keywords:
POST models/_doc/1 { "links": [ "google.com", "wikipedia.org" ] } POST models/_doc/2 { "links": [ "reddit.com", "google.com" ] }
,并且您希望将除
and you'd like to group everything except
POST models*/_search { "query": { "match_all": {} }, "size": 0, "aggs": { "links": { "terms": { "field": "links.keyword", "exclude": ".*reddit.*", <-- "size": 10 } } } }
顺便说一句,使用这种正则表达式,尤其是一些不平凡的含义.当您想到一个区分大小写的场景,在其中需要一个查询时生成的正则表达式时-如
BTW, There are some non-trivial implications arising from the usage of such regexes, esp. when you imagine a case-sensitive scenario in which you'd need a query-time-generated regex — as discussed in How to correctly query inside of terms aggregate values in elasticsearch, using include and regex?
这篇关于通过存储桶键值过滤Elasticsearch聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!