多字段术语聚合方法 [英] Multi-field terms aggregation approach
问题描述
[
{
name :Marco,
city_id:45,
city:Rome
},
{
name:John b $ bcity_id:46,
city:London
},
{
name:Ann,
city_id :47,
city:New York
},
...
]
和聚合:
aggs:{
城市:{
条款:{
字段:城市
}
}
}
给我一个这样的回复:
{
aggregate:{
city:{
doc_count_error_upper_bound:0,
sum_other_doc_count:694,
buckets:[
{
key:Rome,
doc_count:15126
},
{
key:伦敦,
doc_count:11395
},
{
key:纽约,
doc_count:14836
},
...
]
},
..
}
}
我的问题是,我需要有 city_id
对我的聚合结果也是如此。我一直在阅读 here我不能拥有多字段术语聚合,但是我不需要通过两个字段进行聚合,而只是返回另外一个字段,它们对于每个术语字段(基本上是一个city / city_id对)来说都是一样的。我们可以创建一个名为 city_with_id
的字段,其值为像罗马; 45
,伦敦; 46
等,并通过此字段进行聚合。对于我来说,这样做是有效的,因为我可以简单地将结果分解在我的后端,并获得我需要的ID,但这是最好的方法吗?
一种方法是使用 top_hits ,并使用源过滤功能仅返回 city_id
,如下例所示。
我不认为这样做会太低效果
您可以尝试使用索引来查看影响,然后再尝试 city_name_id
示例:
post< index> ; / _ search
{
size:0,
aggs:{
city:{
terms:{
字段:city
},
aggs:{
id:{
top_hits:{
_source:{
include:[
city_id
]
},
size:1
}
}
}
}
}
}
结果:
{
key:London,
doc_count:2,
id:{
hits:{
total:2,
max_score:1,
:[
{
_index:country,
_type:city,
_id:2,
_score :1,
_source:{
city_id:46
}
}
]
}
}
{
key:纽约,
doc_count:1,
id:{
hits:{
total:1,
max_sco re:1,
hits:[
{
_index:country,
_type:city,
_id :3,
_score:1,
_source:{
city_id:47
}
}
]
}
}
},
{
key:Rome,
doc_count:1,
id
hits:{
total:1,
max_score:1,
hits:[
{
_index :country,
_type:city,
_id:1,
_sc矿石:1,
_source:{
city_id:45
}
}
]
}
}
}
I have an index with documents like the following:
[
{
"name": "Marco",
"city_id": 45,
"city": "Rome"
},
{
"name": "John",
"city_id": 46,
"city": "London"
},
{
"name": "Ann",
"city_id": 47,
"city": "New York"
},
...
]
and an aggregation:
"aggs": {
"city": {
"terms": {
"field": "city"
}
}
}
That gives me a response like this:
{
"aggregations": {
"city": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 694,
"buckets": [
{
"key": "Rome",
"doc_count": 15126
},
{
"key": "London",
"doc_count": 11395
},
{
"key": "New York",
"doc_count": 14836
},
...
]
},
...
}
}
My problem is that I need to have the city_id
on my aggregation result as well. I have been reading here that I can't have multi-field terms aggregations, but I don't need to aggregate by two fields but simply return another field that will be always the same for each term field (basically a city/city_id pair). What would be the best way to achieve that without losing performance?
I can create a field named city_with_id
with values like "Rome;45"
, "London;46"
, etc and make the aggregation by this field. For me it would work because I can simply split the results on my backend and get the the ID I need, but is it the best approach?
One approach would be to use top_hits and use source filtering to return only the city_id
as show in the example below.
I don't think this would be prohibitively less performant
You could try it on your indexes to see the impact before trying out the approach of city_name_id
field specified in OP.
Example:
post <index>/_search
{
"size" : 0,
"aggs": {
"city": {
"terms": {
"field": "city"
},
"aggs" : {
"id" : {
"top_hits" : {
"_source": {
"include": [
"city_id"
]
},
"size" : 1
}
}
}
}
}
}
Results:
{
"key": "London",
"doc_count": 2,
"id": {
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "country",
"_type": "city",
"_id": "2",
"_score": 1,
"_source": {
"city_id": 46
}
}
]
}
}
},
{
"key": "New York",
"doc_count": 1,
"id": {
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "country",
"_type": "city",
"_id": "3",
"_score": 1,
"_source": {
"city_id": 47
}
}
]
}
}
},
{
"key": "Rome",
"doc_count": 1,
"id": {
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "country",
"_type": "city",
"_id": "1",
"_score": 1,
"_source": {
"city_id": 45
}
}
]
}
}
}
这篇关于多字段术语聚合方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!