弹性搜索度量聚合:数组中的元素数 [英] Elasticsearch metric aggregation: number of elements in array
问题描述
我想做一个相当涉及的查询/聚合。我看不清,因为我刚刚开始使用ES。文件我看起来像这样:
{
关键字:一些关键字,
items:[
{
name:我的第一项,
item_property_1:A,
(其他属性)
{
name:我的第二项,
item_property_1:B,
(其他属性)
},
{
name:我的第三项,
item_property_1:A,
(其他属性)
}
]
(其他属性...)
},
{
关键字:不同的关键字,
项目:[
{
name:cool item,
item_property_1:A,
(其他属性)
},
{
name bb $ b
(其他属性)
},
]
(其他属性...)
},
(其他文件...)
现在,什么我想做的是,对于每个关键字,可以计算property_1可能有几个可能的值的数量。也就是说,我想要一个具有以下响应的桶聚合:
{
keyword:some关键字,
item_property_1_aggretation:[
{
key:A,
count:2,
},
{
key:B,
count:1,
}
]
},
{
:不同的关键字,
item_property_1_aggretation:[
{
key:A,
count:1,
},
{
key:C,
count:1,
}
]
},
。)
如果需要映射,还可以具体说明哪些?我没有任何非默认映射,我只是把所有东西都放在那里。
编辑:
通过在这里发布大量PUT来节省您的麻烦以前的例子
PUT / test / test / _bulk
{index:{}}
{keyword:some keyword,items:[{name:my first item,item_property_1:A},{name:my second item,item_property_1 B},{name:我的第三项,item_property_1:A}}}
{index:{}}
{keyword ,items:[{name:cool item,item_property_1:A},{name:awesome item,item_property_1:C}]}
EDIT2:
我刚刚尝试过:
POST / test / test / _search
{
size:2,
aggregations :{
property_1_count:{
terms:{
field:item_property_1
}
}
}
}
得到这个:
聚合:{
property_1_count:{
doc_count_error_upper_bound:0,
sum_other_doc_count:0,
buckets:[
{
key:a,
doc_count:2
}
{
key:b,
doc_count:1
},
{
key:c b $ bdoc_count:1
}
]
}
}
关闭但没有雪茄。你可以看到发生了什么事情,而不管其所属的关键字
,每个 item_property_1
我确定解决方案涉及正确添加一些映射,但是我不能把我的手指放在它上面。建议?
EDIT3:
基于此:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-nested-type.html
我想尝试添加一个嵌套
类型到属性项目
。为此,我尝试:
PUT / test / _mapping / test
{
test :{
properties:{
items:{
type:nested,
properties:{
item_property_1 type:string}
}
}
}
}
}
但是,这会返回错误:
{
错误:MergeMappingException [合并失败{{对象映射[项目]不能从非嵌套到嵌套]}],
状态:400
}
这可能与该URL的警告有关:将对象类型更改为嵌套类型需要重建索引 / p>
那么我该怎么做?
你几乎在那里!这是我想出来的。根据您的映射建议,我使用的映射如下:
curl -XPUT localhost:9200 / test / _mapping / test -d'{
test:{
properties:{
keyword:{
type:string,
索引:not_analyzed
},
items:{
type:nested,
properties:{
name {
type:string
},
item_property_1:{
type:string,
index:not_analyzed
}
}
}
}
}
}'
注意:您需要擦除并重新编索您的数据,因为您不能将字段类型从嵌套
更改为嵌套
。
然后我创建了一些您分享的批量查询的数据:
curl -XPOST localhost:9200 / test / test / _bulk -d'
{index:{}}
{keyword:some keyword,items:[{name:my first item,item_property_1:A},{name:my second item,item_property_1 B},{name:我的第三项,item_property_1:A}}}
{index:{}}
{keyword ,items:[{name:cool item,item_property_1:A},{name:awesome item,item_property_1:C}]}
'
最后,您可以使用聚合查询来获取期望的结果。我们首先用关键字
使用 术语
聚合,然后对于每个关键字,我们按嵌套的 item_property_1
字段。由于项目
现在是一个嵌套的
类型,关键是使用 嵌套
a>为项目
,然后$ 术语
子集合为$ code> item_property_1 字段。
{
size:0,
aggregations:{
by_keyword:{
terms:{
field:keyword
},
aggs:{
prop_1_count :{
nested:{
path:items
},
aggs:{
prop_1:{
条款:{
field:items.item_property_1
}
}
}
}
}
}
}
}
在数据集上运行该查询将产生以下结果:
{
...
聚合:{
by_keyword:{
doc_count_error_upper_bound:0,
sum_other_doc_count:0,
buckets:[{
key:different keyword,& 1
doc_count:1,
prop_1_count:{
doc_count:2,
prop_1:{
doc_count_error_upper_bound:0,
sum_other_doc_count:0,
buckets:[{< ... for item_property_1
key:A,
doc_count:1
},{
key:C,
doc_count:1
}]
}
}
},{
key:some keyword,< ----关键字2
doc_count:1,
prop_1_count:{
doc_count
prop_1:{
doc_count_error_upper_bound:0,
sum_other_doc_count:0,
buckets:[{< ... for item_property_1
key:A,
doc_count:2
},{
key:B,
doc_count:1
}]
}
}
}]
}
}
}
I want to do a quite involved query/aggregation. I can't see how because I've just started working with ES. The documents I have look something like this:
{
"keyword": "some keyword",
"items": [
{
"name":"my first item",
"item_property_1":"A",
( other properties here )
},
{
"name":"my second item",
"item_property_1":"B",
( other properties here )
},
{
"name":"my third item",
"item_property_1":"A",
( other properties here )
}
]
( other properties... )
},
{
"keyword": "different keyword",
"items": [
{
"name":"cool item",
"item_property_1":"A",
( other properties here )
},
{
"name":"awesome item",
"item_property_1":"C",
( other properties here )
},
]
( other properties... )
},
( other documents... )
Now, what I would like to do is to, for each keyword, count how many items there are for which of the several possible values that property_1 can have. That is, I want a bucket aggregation that would have the following response:
{
"keyword": "some keyword",
"item_property_1_aggretation": [
{
"key":"A",
"count": 2,
},
{
"key":"B",
"count": 1,
}
]
},
{
"keyword": "different keyword",
"item_property_1_aggretation": [
{
"key":"A",
"count": 1,
},
{
"key":"C",
"count": 1,
}
]
},
( other keywords... )
If mappings are necessary, could you also specificy which? I don't have any non-default mappings, I just dumped everything in there.
EDIT: Saving you the trouble by posting here the bulk PUT for the previous example
PUT /test/test/_bulk
{ "index": {}}
{ "keyword": "some keyword", "items": [ { "name":"my first item", "item_property_1":"A" }, { "name":"my second item", "item_property_1":"B" }, { "name":"my third item", "item_property_1":"A" } ]}
{ "index": {}}
{ "keyword": "different keyword", "items": [ { "name":"cool item", "item_property_1":"A" }, { "name":"awesome item", "item_property_1":"C" } ]}
EDIT2:
I just tried this:
POST /test/test/_search
{
"size":2,
"aggregations": {
"property_1_count": {
"terms":{
"field":"item_property_1"
}
}
}
}
and got this:
"aggregations": {
"property_1_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 2
},
{
"key": "b",
"doc_count": 1
},
{
"key": "c",
"doc_count": 1
}
]
}
}
close but no cigar. You can see what's happening, it's bucketing over each item_property_1
irrespectively of the keyword
it belongs to. I'm sure the solution involves adding some mapping correctly, but I can't put my finger on it. Suggestions?
EDIT3:
Based on this:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-nested-type.html
I want to try adding a nested
type to property items
. To do that, I tried:
PUT /test/_mapping/test
{
"test":{
"properties": {
"items": {
"type": "nested",
"properties": {
"item_property_1":{"type":"string"}
}
}
}
}
}
However, this returns an error:
{
"error": "MergeMappingException[Merge failed with failures {[object mapping [items] can't be changed from non-nested to nested]}]",
"status": 400
}
This might have to do with the warning on that url: "changing an object type to nested type requires reindexing."
So, how do I do that?
Nice tries, you were almost there! Here is what I came up with. Based on your mapping proposal, the mapping I'm using is the following:
curl -XPUT localhost:9200/test/_mapping/test -d '{
"test": {
"properties": {
"keyword": {
"type": "string",
"index": "not_analyzed"
},
"items": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"item_property_1": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}'
Note: you need to wipe and reindex your data, since you cannot change a field type from being not nested
to nested
.
Then I created some data with the bulk query you shared:
curl -XPOST localhost:9200/test/test/_bulk -d '
{ "index": {}}
{ "keyword": "some keyword", "items": [ { "name":"my first item", "item_property_1":"A" }, { "name":"my second item", "item_property_1":"B" }, { "name":"my third item", "item_property_1":"A" } ]}
{ "index": {}}
{ "keyword": "different keyword", "items": [ { "name":"cool item", "item_property_1":"A" }, { "name":"awesome item", "item_property_1":"C" } ]}
'
Finally, here is the aggregation query you can use to get the results you expect. We first bucket by keyword
using a terms
aggregation and then for each keyword, we bucket by the nested item_property_1
field. Since items
is now a nested
type, the key is to use a nested
aggregation for items
and then a terms
sub-aggregation for the item_property_1
field.
{
"size": 0,
"aggregations": {
"by_keyword": {
"terms": {
"field": "keyword"
},
"aggs": {
"prop_1_count": {
"nested": {
"path": "items"
},
"aggs": {
"prop_1": {
"terms": {
"field": "items.item_property_1"
}
}
}
}
}
}
}
}
Running that query on your data set will yield this:
{
...
"aggregations" : {
"by_keyword" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "different keyword", <---- keyword 1
"doc_count" : 1,
"prop_1_count" : {
"doc_count" : 2,
"prop_1" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ { <---- buckets for item_property_1
"key" : "A",
"doc_count" : 1
}, {
"key" : "C",
"doc_count" : 1
} ]
}
}
}, {
"key" : "some keyword", <---- keyword 2
"doc_count" : 1,
"prop_1_count" : {
"doc_count" : 3,
"prop_1" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ { <---- buckets for item_property_1
"key" : "A",
"doc_count" : 2
}, {
"key" : "B",
"doc_count" : 1
} ]
}
}
} ]
}
}
}
这篇关于弹性搜索度量聚合:数组中的元素数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!