通过存储桶键值过滤Elasticsearch聚合 [英] Filter Elasticsearch Aggregation by Bucket Key Value

查看:230
本文介绍了通过存储桶键值过滤Elasticsearch聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文档的Elasticsearch索引,其中有一个包含URL列表的字段.像预期的那样,在该字段上进行汇总可以使我获得唯一URL的数量.

I have an Elasticsearch index of documents in which there is a field that contains a list of URLs. Aggregating on this field gives me the count of unique URLs, as expected.

GET models*/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "links": {
      "terms": {
        "field": "links.keyword",
        "size": 10
      }
    }
  }
}

然后,我想过滤掉其键不包含特定字符串的存储桶.我尝试使用

I then want to filter out the buckets whose keys do not contain a certain string. I've tried doing so with the Bucket Selector Aggregation.

此尝试:

GET models*/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "links": {
      "terms": {
        "field": "links.keyword",
        "size": 10
      }
    },
    "links_key_filter": {
      "bucket_selector": {
        "buckets_path": {
          "key": "links"
        },
        "script": "!key.contains('foo')"
      }
    }
  }
}

失败:

类型为[links_key_filter]的无效管道聚合[bucket_selector].在以下位置仅允许同级管道聚合顶层

Invalid pipeline aggregation named [links_key_filter] of type [bucket_selector]. Only sibling pipeline aggregations are allowed at the top level

将存储桶选择器放入链接聚合中,如下所示:

Putting the bucket selector inside the links aggregation, like so:

GET models*/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "links": {
      "terms": {
        "field": "links.keyword",
        "size": 10
      },
      "bucket_selector": {
        "buckets_path": {
          "key": "links"
        },
        "script": "!key.contains('foo')"
      }
    }
  }
}

失败:

在[链接]中找到了两个聚合类型定义:[条款]和[bucket_selector]

Found two aggregation type definitions in [links]: [terms] and [bucket_selector]

我将继续修补,但此刻有些卡住:(

I'm going to keep tinkering but am a bit stuck at the moment :(

推荐答案

无法使用 bucket_selector ,因为它的 bucket_path >

必须引用数字值或单值数字度量聚合以及 terms 聚合产生的结果表示为 StringTerms -不管您是否

and what a terms aggregation produces is denoted as StringTerms — and that simply won't work, regardless of whether you force a placeholder multibucket aggregation or not.

假设您的链接是关键字数组:

Assuming that your links are arrays of keywords:

POST models/_doc/1
{
  "links": [
    "google.com",
    "wikipedia.org"
  ]
}

POST models/_doc/2
{
  "links": [
    "reddit.com",
    "google.com"
  ]
}

,并且您希望将除 reddit 以外的所有内容归为一组,可以使用以下正则表达式:

and you'd like to group everything except reddit, you can use the following regex:

POST models*/_search
{
  "query": {
    "match_all": {}
  },
  "size": 0,
  "aggs": {
    "links": {
      "terms": {
        "field": "links.keyword",
        "exclude": ".*reddit.*",    <-- 
        "size": 10
      }
    }
  }
}

顺便说一句,使用这种正则表达式,尤其是一些不平凡的含义.当您想到一个区分大小写的场景,在其中需要一个查询时生成的正则表达式时-如

BTW, There are some non-trivial implications arising from the usage of such regexes, esp. when you imagine a case-sensitive scenario in which you'd need a query-time-generated regex — as discussed in How to correctly query inside of terms aggregate values in elasticsearch, using include and regex?

这篇关于通过存储桶键值过滤Elasticsearch聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆