弹性搜索查询/过滤嵌套数组 [英] Elastic Search Querying/filtering nested arrays

查看:216
本文介绍了弹性搜索查询/过滤嵌套数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在ES的索引test_agg上存储了下面的嵌套数据类型。

I have stored below type of nested data on my index test_agg in ES.

{
  "Date": "2015-10-21",
  "Domain": "abc.com",
  "Processed_at": "10/23/2015 9:47",
  "Events": [
    {
      "Name": "visit",
      "Count": "188",
      "Value_Aggregations": [
        {
          "Value": "red",
          "Count": "100"
        }
      ]
    },
    {
      "Name": "order_created",
      "Count": "159",
      "Value_Aggregations": [
        {
          "Value": "$125",
          "Count": "50"
        }
      ]
   },
 ]
}

嵌套项目的映射是

curl -XPOST localhost:9200/test_agg/nested_evt/_mapping -d '{
"nested_evt":{
"properties":{
   "Events": {
       "type": "nested"
    }
   }
  }
}'

我正在尝试获取Events.Count和Events.Value_Aggregations.Count,其中Events.Name ='访问'使用以下查询

I am trying to get "Events.Count" and "Events.Value_Aggregations.Count" where Events.Name='Visit' using the below query

{
 "fields" : ["Events.Count","Events.Value_Aggregations.Count"]
  "query": {
     "filtered": {
        "query": {
            "match": { "Domain": "abc.com" }
        },
        "filter": {
            "nested": {
                "path": "Events",
                "query": {
                     "match": { "Events.Name": "visit" }
                },
              }
          }
      }
   }
 }

而不是单个值


Events.Count = [188] Events.Value_Aggregations.Count = [100]

Events.Count=[188] Events.Value_Aggregations.Count=[100]

它给了


Events.Count = [188,159] Events.Value_Aggregations.Count = [100,50]

Events.Count=[188,159] Events.Value_Aggregations.Count=[100,50]

什么是确切的查询结构以获得我想要的输出?

what is the exact query structure to get my desired output?

推荐答案

所以这里的问题是,您正在应用的嵌套过滤器根据 >嵌套子文档。所以ES找到与您的查询匹配的父文档(基于文档的嵌套子项)。然后,而不是返回整个文档,因为您已经指定了fields,它只会选出您要求的那些字段。那些字段恰好是嵌套字段,并且由于父文档有两个嵌套子元素,因此为每个指定的字段找到两个值,并返回它们。据我所知,没有办法返回子文档,至少有一个嵌套架构。

So the problem here is that the nested filter you are applying selects parent documents based on attributes of the nested child documents. So ES finds the parent document that matches your query (based on the document's nested children). Then, instead of returning the entire document, since you have specified "fields" it picks out only those fields that you have asked for. Those fields happen to be nested fields, and since the parent document has two nested children, it finds two values each for the fields you specified and returns them. To my knowledge there is no way to return the child documents instead, at least with a nested architecture.

一解决这个问题的方法是使用父/子关系,那么你可以使用 has_parent 查询与其他过滤器的组合,针对孩子类型以获得所需的内容。这可能是一个更干净的方法,只要架构架构不会与您的其他需求相冲突。

One solution to this problem would be to use the parent/child relationship instead, then you could use a has_parent query in combination with the other filters, against the child type to get what you want. That would probably be a cleaner way to do this, as long as the schema architecture doesn't conflict with your other needs.

但是,有一种方法可以做排序您正在要求的内容与您目前的架构一起使用嵌套聚合过滤器聚合。这是一种涉及(在这种情况下有点模糊;见下面的说明),但这里是查询:

However, there is a way to do sort of what you are asking, with your current schema, with a nested aggregation combined with a filter aggregation. It's kind of involved (and slightly ambiguous in this case; see explanation below), but here's the query:

POST /test_index/_search
{
   "size": 0,
   "query": {
      "filtered": {
         "query": {
            "match": {
               "Domain": "abc.com"
            }
         },
         "filter": {
            "nested": {
               "path": "Events",
               "query": {
                  "match": {
                     "Events.Name": "visit"
                  }
               }
            }
         }
      }
   },
   "aggs": {
      "nested_events": {
         "nested": {
            "path": "Events"
         },
         "aggs": {
            "filtered_events": {
               "filter": {
                  "term": {
                     "Events.Name": "visit"
                  }
               },
               "aggs": {
                  "events_count_terms": {
                     "terms": {
                        "field": "Events.Count"
                     }
                  },
                  "value_aggregations_count_terms": {
                     "terms": {
                        "field": "Events.Value_Aggregations.Count"
                     }
                  }
               }
            }
         }
      }
   }
}

返回:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "nested_events": {
         "doc_count": 2,
         "filtered_events": {
            "doc_count": 1,
            "value_aggregations_count_terms": {
               "doc_count_error_upper_bound": 0,
               "sum_other_doc_count": 0,
               "buckets": [
                  {
                     "key": "100",
                     "doc_count": 1
                  }
               ]
            },
            "events_count_terms": {
               "doc_count_error_upper_bound": 0,
               "sum_other_doc_count": 0,
               "buckets": [
                  {
                     "key": "188",
                     "doc_count": 1
                  }
               ]
            }
         }
      }
   }
}

注意事项:我不清楚您是否确实需要过滤器:我在这里显示的查询的{嵌套:{...}} 子句。如果这个部分以有用的方式过滤掉父文档,那么你需要它。如果你唯一的意图是选择哪个嵌套的子文档来返回字段,那么这里是多余的,因为过滤器聚合正在处理这个部分。

Caveat: it's not clear to me whether you actually need the "filter": { "nested": { ... } } clause of the "query" in what I've shown here. If this part filters out parent documents in a useful way, then you need it. If your only intention was to select which nested child documents from which to return fields, then it's redundant here since the filter aggregation is taking care of that part.

这是我用来测试的代码:

Here is the code I used to test it:

http://sense.qbox.io/gist/dcc46e50117031de300b6f91c647fe9b729a5283

这篇关于弹性搜索查询/过滤嵌套数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆