弹性搜索度量聚合：数组中的元素数 [英] Elasticsearch metric aggregation: number of elements in array

查看：350 发布时间：2017/8/7 0:01:59 elasticsearch aggregate

本文介绍了弹性搜索度量聚合：数组中的元素数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想做一个相当涉及的查询/聚合。我看不清，因为我刚刚开始使用ES。文件我看起来像这样：

  {
关键字：一些关键字，
items：[
 {
name：我的第一项，
item_property_1：A，
（其他属性）
 
 {
name：我的第二项，
item_property_1：B，
（其他属性）
}，
 {
name：我的第三项，
item_property_1：A，
（其他属性）
} 
] 
（其他属性...）
}，
 {
关键字：不同的关键字，
项目：[
 {
name：cool item，
item_property_1：A，
（其他属性）
}，
 {
name bb $ b

（其他属性）
}，
] 
（其他属性...）
}，
（其他文件...）

现在，什么我想做的是，对于每个关键字，可以计算property_1可能有几个可能的值的数量。也就是说，我想要一个具有以下响应的桶聚合：

  {
keyword：some关键字，
item_property_1_aggretation：[
 {
key：A，
count：2，
}，
 { 
key：B，
count：1，
} 
] 
}，
 {
 ：不同的关键字，
item_property_1_aggretation：[
 {
key：A，
count：1，
}，
 {
key：C，
count：1，
} 
] 
}，
 。）

如果需要映射，还可以具体说明哪些？我没有任何非默认映射，我只是把所有东西都放在那里。

编辑：
通过在这里发布大量PUT来节省您的麻烦以前的例子

  PUT / test / test / _bulk 
 {index：{}} 
 {keyword：some keyword，items：[{name：my first item，item_property_1：A}，{name：my second item，item_property_1 B}，{name：我的第三项，item_property_1：A}}} 
 {index：{}} 
 {keyword ，items：[{name：cool item，item_property_1：A}，{name：awesome item，item_property_1：C}]}

EDIT2：

我刚刚尝试过：

  POST / test / test / _search 
 {
size：2，
aggregations ：{
property_1_count：{
terms：{
field：item_property_1
} 
} 
} 
}

得到这个：

 聚合：{
property_1_count：{
doc_count_error_upper_bound：0，
sum_other_doc_count：0，
buckets：[
 {
key：a，
doc_count：2 
} 
 {
key：b，
doc_count：1 
}，
 {
key：c b $ bdoc_count：1 
} 
] 
} 
}

关闭但没有雪茄。你可以看到发生了什么事情，而不管其所属的关键字，每个 item_property_1 我确定解决方案涉及正确添加一些映射，但是我不能把我的手指放在它上面。建议？

EDIT3：
基于此：
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-nested-type.html
我想尝试添加一个嵌套类型到属性项目。为此，我尝试：

  PUT / test / _mapping / test 
 {
test ：{
properties：{
items：{
type：nested，
properties：{
item_property_1 type：string} 
} 
} 
} 
} 
}

但是，这会返回错误：

  {
错误：MergeMappingException [合并失败{{对象映射[项目]不能从非嵌套到嵌套]}]，
状态：400 
}

这可能与该URL的警告有关：将对象类型更改为嵌套类型需要重建索引 / p>

那么我该怎么做？

解决方案

你几乎在那里！这是我想出来的。根据您的映射建议，我使用的映射如下：

  curl -XPUT localhost：9200 / test / _mapping / test -d'{
test：{
properties：{
keyword：{
type：string，
索引：not_analyzed
}，
items：{
type：nested，
properties：{
name {
type：string
}，
item_property_1：{
type：string，
index：not_analyzed 
} 
} 
} 
} 
} 
}'

注意：您需要擦除并重新编索您的数据，因为您不能将字段类型从嵌套更改为嵌套。

然后我创建了一些您分享的批量查询的数据：

  curl -XPOST localhost：9200 / test / test / _bulk -d'
 {index：{}} 
 {keyword：some keyword，items：[{name：my first item，item_property_1：A}，{name：my second item，item_property_1 B}，{name：我的第三项，item_property_1：A}}} 
 {index：{}} 
 {keyword ，items：[{name：cool item，item_property_1：A}，{name：awesome item，item_property_1：C}]} 
 '

最后，您可以使用聚合查询来获取期望的结果。我们首先用关键字使用 术语聚合，然后对于每个关键字，我们按嵌套的 item_property_1 字段。由于项目现在是一个嵌套的类型，关键是使用 嵌套 a>为项目，然后$ 术语子集合为$ code> item_property_1 字段。

  {
size：0，
aggregations：{
by_keyword：{
terms：{
field：keyword
}，
aggs：{
prop_1_count ：{
nested：{
path：items
}，
aggs：{
prop_1：{
 条款：{
field：items.item_property_1
} 
} 
} 
} 
} 
} 
} 
}

在数据集上运行该查询将产生以下结果：

  {
 ... 
聚合：{
by_keyword：{ 
doc_count_error_upper_bound：0，
sum_other_doc_count：0，
buckets：[{
key：different keyword，& 1 
doc_count：1，
prop_1_count：{
doc_count：2，
prop_1：{
doc_count_error_upper_bound：0，
sum_other_doc_count：0，
buckets：[{< ... for item_property_1 
key：A，
doc_count：1 
}，{
key：C，
doc_count：1 
}] 
} 
} 
}，{ 
key：some keyword，< ----关键字2 
doc_count：1，
prop_1_count：{
doc_count 
prop_1：{
 doc_count_error_upper_bound：0，
sum_other_doc_count：0，
buckets：[{< ... for item_property_1 
key：A，
doc_count：2 
}，{
key：B，
doc_count：1 
}] 
} 
} 
}] 
} 
} 
}

I want to do a quite involved query/aggregation. I can't see how because I've just started working with ES. The documents I have look something like this:

{
  "keyword": "some keyword",
  "items": [
    {
      "name":"my first item",
      "item_property_1":"A",
      ( other properties here )
    },
    {
      "name":"my second item",
      "item_property_1":"B",
      ( other properties here )
    },
    {
      "name":"my third item",
      "item_property_1":"A",
      ( other properties here )
    }
  ]
  ( other properties... )
},
{
  "keyword": "different keyword",
  "items": [
    {
      "name":"cool item",
      "item_property_1":"A",
      ( other properties here )
    },
    {
      "name":"awesome item",
      "item_property_1":"C",
      ( other properties here )
    },
  ]
  ( other properties... )
},
( other documents... )

Now, what I would like to do is to, for each keyword, count how many items there are for which of the several possible values that property_1 can have. That is, I want a bucket aggregation that would have the following response:

{
  "keyword": "some keyword",
  "item_property_1_aggretation": [
    {
      "key":"A",
      "count": 2,
    },
    {
      "key":"B",
      "count": 1,
    }
  ]
},
{
  "keyword": "different keyword",
  "item_property_1_aggretation": [
    {
      "key":"A",
      "count": 1,
    },
    {
      "key":"C",
      "count": 1,
    }
  ]
},
( other keywords... )

If mappings are necessary, could you also specificy which? I don't have any non-default mappings, I just dumped everything in there.

EDIT: Saving you the trouble by posting here the bulk PUT for the previous example

PUT /test/test/_bulk
{ "index": {}}
{  "keyword": "some keyword",  "items": [    {      "name":"my first item",      "item_property_1":"A"    },    {      "name":"my second item",      "item_property_1":"B"    },    {      "name":"my third item",      "item_property_1":"A"     }  ]}
{ "index": {}}
{  "keyword": "different keyword",  "items": [    {      "name":"cool item",      "item_property_1":"A"    },    {      "name":"awesome item",      "item_property_1":"C"    }  ]}

EDIT2:

I just tried this:

POST /test/test/_search
{
    "size":2,
    "aggregations": {
        "property_1_count": {
            "terms":{
                "field":"item_property_1"
            }
        }
    }
}

and got this:

"aggregations": {
   "property_1_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
         {
            "key": "a",
            "doc_count": 2
         },
         {
            "key": "b",
            "doc_count": 1
         },
         {
            "key": "c",
            "doc_count": 1
         }
      ]
   }
}

close but no cigar. You can see what's happening, it's bucketing over each item_property_1 irrespectively of the keyword it belongs to. I'm sure the solution involves adding some mapping correctly, but I can't put my finger on it. Suggestions?

EDIT3: Based on this: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-nested-type.html I want to try adding a nested type to property items. To do that, I tried:

PUT /test/_mapping/test
{
    "test":{
        "properties": {
            "items": {
                "type": "nested",
                "properties": {
                    "item_property_1":{"type":"string"}
                }
            }
        }
    }
}

However, this returns an error:

{
   "error": "MergeMappingException[Merge failed with failures {[object mapping [items] can't be changed from non-nested to nested]}]",
   "status": 400
}

This might have to do with the warning on that url: "changing an object type to nested type requires reindexing."

So, how do I do that?

解决方案

Nice tries, you were almost there! Here is what I came up with. Based on your mapping proposal, the mapping I'm using is the following:

curl -XPUT localhost:9200/test/_mapping/test -d '{
  "test": {
    "properties": {
      "keyword": {
        "type": "string",
        "index": "not_analyzed"
      },
      "items": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "string"
          },
          "item_property_1": {
            "type": "string",
            "index": "not_analyzed"
          }
        }
      }
    }
  }
}'

Note: you need to wipe and reindex your data, since you cannot change a field type from being not nested to nested.

Then I created some data with the bulk query you shared:

curl -XPOST localhost:9200/test/test/_bulk -d '
{ "index": {}}
{  "keyword": "some keyword",  "items": [    {      "name":"my first item",      "item_property_1":"A"    },    {      "name":"my second item",      "item_property_1":"B"    },    {      "name":"my third item",      "item_property_1":"A"     }  ]}
{ "index": {}}
{  "keyword": "different keyword",  "items": [    {      "name":"cool item",      "item_property_1":"A"    },    {      "name":"awesome item",      "item_property_1":"C"    }  ]}
'

Finally, here is the aggregation query you can use to get the results you expect. We first bucket by keyword using a terms aggregation and then for each keyword, we bucket by the nested item_property_1 field. Since items is now a nested type, the key is to use a nested aggregation for items and then a terms sub-aggregation for the item_property_1 field.

{
  "size": 0,
  "aggregations": {
    "by_keyword": {
      "terms": {
        "field": "keyword"
      },
      "aggs": {
        "prop_1_count": {
          "nested": {
            "path": "items"
          },
          "aggs": {
            "prop_1": {
              "terms": {
                "field": "items.item_property_1"
              }
            }
          }
        }
      }
    }
  }
}

Running that query on your data set will yield this:

{
  ...
  "aggregations" : {
    "by_keyword" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "different keyword",       <---- keyword 1
        "doc_count" : 1,
        "prop_1_count" : {
          "doc_count" : 2,
          "prop_1" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [ {                <---- buckets for item_property_1
              "key" : "A",
              "doc_count" : 1
            }, {
              "key" : "C",
              "doc_count" : 1
            } ]
          }
        }
      }, {
        "key" : "some keyword",            <---- keyword 2
        "doc_count" : 1,
        "prop_1_count" : {
          "doc_count" : 3,
          "prop_1" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [ {                <---- buckets for item_property_1
              "key" : "A",
              "doc_count" : 2
            }, {
              "key" : "B",
              "doc_count" : 1
            } ]
          }
        }
      } ]
    }
  }
}

这篇关于弹性搜索度量聚合：数组中的元素数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

弹性搜索度量聚合：数组中的元素数 [英] Elasticsearch metric aggregation: number of elements in array

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

弹性搜索度量聚合：数组中的元素数 [英] Elasticsearch metric aggregation: number of elements in array

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭