如何获得具有多个字段的弹性搜索聚合 [英] How to get an Elasticsearch aggregation with multiple fields

查看:150
本文介绍了如何获得具有多个字段的弹性搜索聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找到与当前正在浏览的标签相关的标签。我们的索引中的每个文档都被标记。每个标签由两部分组成:ID和文本名称:

I'm attempting to find related tags to the one currently being viewed. Every document in our index is tagged. Each tag is formed of two parts - an ID and text name:

{
    ...
    meta: {
        ...
        tags: [
            {
                id: 123,
                name: 'Biscuits'
            },
            {
                id: 456,
                name: 'Cakes'
            },
            {
                id: 789,
                name: 'Breads'
            }
        ]
    }
}

要获取相关标签只需查询文档并得到其标签的总和:

To fetch the related tags I am simply querying the documents and getting an aggregate of their tags:

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "item.meta.tags.id": "123"
                    }
                },
                {
                    ...
                }
            ]
        }
    },
    "aggs": {
        "baked_goods": {
            "terms": {
                "field": "item.meta.tags.id",
                "min_doc_count": 2
            }
        }
    }
}

这样做完美,我得到了我想要的结果。但是,我要求标签ID 名称做任何有用的事情。我已经探索了如何完成这个,解决方案似乎是:

This works perfectly, I am getting the results I want. However, I require both the tag ID and name to do anything useful. I have explored how to accomplish this, the solutions seem to be:


  1. 在索引时组合字段

  2. 一个将字段拼凑起来的脚本

  3. 嵌套聚合

选项一和二是不可用的,所以我已经去了3,但它没有以预期的方式回应。给出以下查询(仍在搜索也标记为饼干的文档):

Option one and two are are not available to me so I have been going with 3 but it's not responding in an expected manner. Given the following query (still searching for documents also tagged with 'Biscuits'):

{
    ...
    "aggs": {
        "baked_goods": {
            "terms": {
                "field": "item.meta.tags.id",
                "min_doc_count": 2
            },
            "aggs": {
                "name": {
                    "terms": {
                        "field": "item.meta.tags.name"
                    }
                }
            }
        }
    }
}

我会得到这个结果:

{
    ...
    "aggregations": {
        "baked_goods": {
            "buckets": [
                {
                    "key": "456",
                    "doc_count": 11,
                    "name": {
                        "buckets": [
                            {
                                "key": "Biscuits",
                                "doc_count": 11
                            },
                            {
                                "key": "Cakes",
                                "doc_count": 11
                            }
                        ]
                    }
                }
            ]
        }
    }
}

嵌套聚合包括我之前的标签(和按字母顺序返回)的搜索字词

The nested aggregation includes both the search term and the tag I'm after (returned in alphabetical order).

我已经尝试通过在嵌套聚合中添加一个 exclude 来减轻这一点,但是这个查询减慢了太多(周围100000次,共500000次)。到目前为止,最快的解决方案是手动去除结果。

I have tried to mitigate this by adding an exclude to the nested aggregation but this slowed the query down far too much (around 100 times for 500000 docs). So far the fastest solution is to de-dupe the result manually.

在标签ID和标签名称的标签集合中,最好的方法是回复?

What is the best way to get an aggregation of tags with both the tag ID and tag name in the response?

感谢您这么做!

推荐答案

它的外观,您的标签不是嵌套
要使此聚合工作,您需要它嵌套 ,以便 id 名称。没有嵌套 id 的列表只是一个数组,名称 s是另一个数组:

By the looks of it, your tags is not nested. For this aggregation to work, you need it nested so that there is an association between an id and a name. Without nested the list of ids is just an array and the list of names is another array:

    "item": {
      "properties": {
        "meta": {
          "properties": {
            "tags": {
              "type": "nested",           <-- nested field
              "include_in_parent": true,  <-- to, also, keep the flat array-like structure
              "properties": {
                "id": {
                  "type": "integer"
                },
                "name": {
                  "type": "string"
                }
              }
            }
          }
        }
      }
    }

另外,请注意,我已经添加到映射这一行include_in_parent:true 这意味着您的嵌套标签也将像平面结构。

Also, note that I've added to the mapping this line "include_in_parent": true which means that your nested tags will, also, behave like a "flat" array-like structure.

所以,你在查询中到目前为止的所有内容仍然可以在没有任何更改查询的情况下工作。

So, everything you had so far in your queries will still work without any changes to the queries.

但是,对于这个特定的查询聚合需要改变为这样的一种:

But, for this particular query of yours, the aggregation needs to change to something like this:

{
  "aggs": {
    "baked_goods": {
      "nested": {
        "path": "item.meta.tags"
      },
      "aggs": {
        "name": {
          "terms": {
            "field": "item.meta.tags.id"
          },
          "aggs": {
            "name": {
              "terms": {
                "field": "item.meta.tags.name"
              }
            }
          }
        }
      }
    }
  }
}

结果是这样的:

   "aggregations": {
      "baked_goods": {
         "doc_count": 9,
         "name": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
               {
                  "key": 123,
                  "doc_count": 3,
                  "name": {
                     "doc_count_error_upper_bound": 0,
                     "sum_other_doc_count": 0,
                     "buckets": [
                        {
                           "key": "biscuits",
                           "doc_count": 3
                        }
                     ]
                  }
               },
               {
                  "key": 456,
                  "doc_count": 2,
                  "name": {
                     "doc_count_error_upper_bound": 0,
                     "sum_other_doc_count": 0,
                     "buckets": [
                        {
                           "key": "cakes",
                           "doc_count": 2
                        }
                     ]
                  }
               },
               .....

这篇关于如何获得具有多个字段的弹性搜索聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆