如何获得具有多个字段的 Elasticsearch 聚合 [英] How to get an Elasticsearch aggregation with multiple fields

查看:58
本文介绍了如何获得具有多个字段的 Elasticsearch 聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试查找与当前正在查看的标签相关的标签.我们索引中的每个文档都带有标签.每个标签由两部分组成 - ID 和文本名称:

I'm attempting to find related tags to the one currently being viewed. Every document in our index is tagged. Each tag is formed of two parts - an ID and text name:

{
    ...
    meta: {
        ...
        tags: [
            {
                id: 123,
                name: 'Biscuits'
            },
            {
                id: 456,
                name: 'Cakes'
            },
            {
                id: 789,
                name: 'Breads'
            }
        ]
    }
}

要获取相关标签,我只需查询文档并获取其标签的集合:

To fetch the related tags I am simply querying the documents and getting an aggregate of their tags:

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "item.meta.tags.id": "123"
                    }
                },
                {
                    ...
                }
            ]
        }
    },
    "aggs": {
        "baked_goods": {
            "terms": {
                "field": "item.meta.tags.id",
                "min_doc_count": 2
            }
        }
    }
}

这很完美,我得到了我想要的结果.但是,我需要标签 ID 名称来做任何有用的事情.我已经探索了如何实现这一点,解决方案似乎是:

This works perfectly, I am getting the results I want. However, I require both the tag ID and name to do anything useful. I have explored how to accomplish this, the solutions seem to be:

  1. 索引时组合字段
  2. 将字段组合在一起的脚本
  3. 嵌套聚合

选项一和二对我来说不可用,所以我一直在使用 3,但它没有以预期的方式响应.鉴于以下查询(仍在搜索也标记为饼干"的文档):

Option one and two are are not available to me so I have been going with 3 but it's not responding in an expected manner. Given the following query (still searching for documents also tagged with 'Biscuits'):

{
    ...
    "aggs": {
        "baked_goods": {
            "terms": {
                "field": "item.meta.tags.id",
                "min_doc_count": 2
            },
            "aggs": {
                "name": {
                    "terms": {
                        "field": "item.meta.tags.name"
                    }
                }
            }
        }
    }
}

我会得到这个结果:

{
    ...
    "aggregations": {
        "baked_goods": {
            "buckets": [
                {
                    "key": "456",
                    "doc_count": 11,
                    "name": {
                        "buckets": [
                            {
                                "key": "Biscuits",
                                "doc_count": 11
                            },
                            {
                                "key": "Cakes",
                                "doc_count": 11
                            }
                        ]
                    }
                }
            ]
        }
    }
}

嵌套聚合包括搜索词我所追求的标签(按字母顺序返回).

The nested aggregation includes both the search term and the tag I'm after (returned in alphabetical order).

我试图通过向嵌套聚合添加 exclude 来缓解这种情况,但这会大大降低查询速度(对于 500000 个文档大约 100 倍).到目前为止,最快的解决方案是手动对结果进行重复数据删除.

I have tried to mitigate this by adding an exclude to the nested aggregation but this slowed the query down far too much (around 100 times for 500000 docs). So far the fastest solution is to de-dupe the result manually.

在响应中获取包含标签 ID 和标签名称的标签聚合的最佳方法是什么?

What is the best way to get an aggregation of tags with both the tag ID and tag name in the response?

感谢您走到这一步!

推荐答案

从表面上看,您的 tags 不是 嵌套.要使此聚合起作用,您需要 nested 以便在 idname 之间存在关联.如果没有 nestedids 的列表只是一个数组,而 names 的列表是另一个数组:

By the looks of it, your tags is not nested. For this aggregation to work, you need it nested so that there is an association between an id and a name. Without nested the list of ids is just an array and the list of names is another array:

    "item": {
      "properties": {
        "meta": {
          "properties": {
            "tags": {
              "type": "nested",           <-- nested field
              "include_in_parent": true,  <-- to, also, keep the flat array-like structure
              "properties": {
                "id": {
                  "type": "integer"
                },
                "name": {
                  "type": "string"
                }
              }
            }
          }
        }
      }
    }

另外,请注意,我已经将这一行 "include_in_parent": true 添加到映射中,这意味着您的 nested 标签也将表现得像一个flat" 类似数组的结构.

Also, note that I've added to the mapping this line "include_in_parent": true which means that your nested tags will, also, behave like a "flat" array-like structure.

因此,到目前为止,您在查询中的所有内容仍然可以正常工作,无需对查询进行任何更改.

So, everything you had so far in your queries will still work without any changes to the queries.

但是,对于您的这个特定查询,聚合需要更改为如下所示:

But, for this particular query of yours, the aggregation needs to change to something like this:

{
  "aggs": {
    "baked_goods": {
      "nested": {
        "path": "item.meta.tags"
      },
      "aggs": {
        "name": {
          "terms": {
            "field": "item.meta.tags.id"
          },
          "aggs": {
            "name": {
              "terms": {
                "field": "item.meta.tags.name"
              }
            }
          }
        }
      }
    }
  }
}

结果是这样的:

   "aggregations": {
      "baked_goods": {
         "doc_count": 9,
         "name": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
               {
                  "key": 123,
                  "doc_count": 3,
                  "name": {
                     "doc_count_error_upper_bound": 0,
                     "sum_other_doc_count": 0,
                     "buckets": [
                        {
                           "key": "biscuits",
                           "doc_count": 3
                        }
                     ]
                  }
               },
               {
                  "key": 456,
                  "doc_count": 2,
                  "name": {
                     "doc_count_error_upper_bound": 0,
                     "sum_other_doc_count": 0,
                     "buckets": [
                        {
                           "key": "cakes",
                           "doc_count": 2
                        }
                     ]
                  }
               },
               .....

这篇关于如何获得具有多个字段的 Elasticsearch 聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆