深层嵌套类型的弹性搜索聚合 [英] Elasticsearch aggregation of deep nested type

查看:134
本文介绍了深层嵌套类型的弹性搜索聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以前我已经询问过这个问题。

Previously I have asked this question.

示例文档中有一个简化的文档。这对我来说了解非嵌套类型与嵌套类型之间的聚合差异是有好处的。然而,简化是隐藏着更多的复杂性,所以我必须在这里进行扩展。

The example document there was a simplified document. That was good for me to understand the differences in aggregation over non-nested type versus nested type. However, the simplification was hiding further complexity and so I have to expand on the question here.

所以我的实际文档更接近于以下内容:

So my actual documents are closer to the following:

"_source": {
    "keyword": "my keyword",
    "response": [
        {
            "results": [
                {
                    "items": [
                        {
                            "prop": [
                                {
                                    "item_property_1": ["A"],
                                }
                            ]
                            ( ... other properties )
                        },
                        {
                            "prop": [
                                {
                                    "item_property_1": ["B"],
                                }
                            ]
                            ( ... other properties )
                        },
                        ( ... other items )
                    ]
                }
            ],
            ( ... other properties )
        }
    ]
}

所以我保持关键属性关键字项目 item_property_1 ,但藏有很多其他使情况复杂化的事情。首先,请注意,与引用的问题相比,有很多额外的嵌套:在根和项目之间以及项目和item_property_1之间。此外,还要注意,属性响应结果都是具有单个元素的数组。这是奇怪的,但这是怎么回事: - )

So I kept the crucial properties keyword, items, and item_property_1, but hid lots of other things that complicate the situation. First, notice that compared to the referenced question there is lots of extra nesting: between the root and "items", and between "items" and "item_property_1". Additionally, notice also that the properties response and results are both arrays with a single element. It's weird, but that's how it is :-)

现在,这个问题与上述不同的原因是我尝试了接受的答案为这个例子工作),而且在这里不起作用。也就是说,如果我使用以下映射:

Now, the reason why this question is different from the one cited above is that I tried the accepted answer (which does work for the example there), and it doesn't work here. That is, if I use a mapping with:

"items": {
    "type":"nested",
    "properties": {
        "prop": {
            "properties": {
                "item_property_1": {
                    "type": "string",
                    "index": "not_analyzed"
                },
            }
        }
    }
}

然后聚合不起作用。它返回零点击。

then the aggregation doesn't work. It returns zero hits.

我将稍后编辑并提供一个可以使用的样本批量插入。

I will edit later and provider a ready to use sample bulk insert.

编辑:
好​​的,下面我显示三个查询,分别是:映射,批量插入和聚合(零点击)

Alright, below I show three queries which are respectively: mapping, bulk insert and aggregation (with zero hits)

映射(使用type:nested如前面回答的问题所示)

Mapping (with "type":"nested" as indicated in the previous answered question)

PUT /test2/_mapping/test3
{
    "test3": {
        "properties": {
            "keyword": {
                "type": "string",
                "index": "not_analyzed"
            },
            "response": {
                "properties": {
                    "results": {
                        "properties": {
                            "items": {
                                "type": "nested",
                                "properties": {
                                    "prop": {
                                        "properties": {
                                            "item_property_1": {
                                                "type": "string",
                                                "index": "not_analyzed"
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

批量投放:

PUT /test2/test3/_bulk
{ "index": {}}
{    "keyword": "my keyword",    "response": [        {            "results": [                {                    "items": [                        {                            "prop": [                                {"item_property_1": ["A"]}                            ]                        },                        {                            "prop": [                                {"item_property_1": ["B"]}                            ]                        },                        {                            "prop": [                                {"item_property_1": ["A"]}                            ]                        }                    ]                }            ]        }    ]}
{ "index": {}}
{    "keyword": "different keyword",    "response": [        {            "results": [                {                    "items": [                        {                            "prop": [                                {"item_property_1": ["A"]}                            ]                        },                        {                            "prop": [                                {"item_property_1": ["C"]}                            ]                        }                    ]                }            ]        }    ]}

聚合(零点击):

POST /test2/test3/_search
{
    "size":0,
    "aggregations": {
        "item_property_1_count": {
            "terms":{
                "field":"item_property_1"
            }
        }
    }
}


推荐答案

与以前的答案没有太大的区别。所有你需要的是修改字段名称一点,以考虑到额外的嵌套。除此之外,映射中没有什么需要改变。请注意,该查询的工作原理是,只有因为响应结果是具有单个元素的数组,如果它是在这种情况下,它会更多地涉及,并且需要映射更改和不同的查询。

It's not really different from the previous answer. All you need is to modify the field names a little bit to take into account the additional nesting. Other than that, nothing needs to change in the mapping. Note that this query works without mapping changes only because response and results are both arrays with a single element, if it wasn't the case, it would be more involved and would require mapping changes and a different query.

查询现在如下所示:

{
  "size": 0,
  "aggregations": {
    "by_keyword": {
      "terms": {
        "field": "keyword"
      },
      "aggs": {
        "prop_1_count": {
          "nested": {
            "path": "response.results.items"
          },
          "aggs": {
            "prop_1": {
              "terms": {
                "field": "response.results.items.prop.item_property_1"
              }
            }
          }
        }
      }
    }
  }
}

结果:

{
  ...
  "aggregations" : {
    "by_keyword" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "different keyword",
        "doc_count" : 1,
        "prop_1_count" : {
          "doc_count" : 2,
          "prop_1" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [ {
              "key" : "A",
              "doc_count" : 1
            }, {
              "key" : "C",
              "doc_count" : 1
            } ]
          }
        }
      }, {
        "key" : "my keyword",
        "doc_count" : 1,
        "prop_1_count" : {
          "doc_count" : 3,
          "prop_1" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [ {
              "key" : "A",
              "doc_count" : 2
            }, {
              "key" : "B",
              "doc_count" : 1
            } ]
          }
        }
      } ]
    }
  }
}

这篇关于深层嵌套类型的弹性搜索聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆