多字段术语聚合方法 [英] Multi-field terms aggregation approach

查看:100
本文介绍了多字段术语聚合方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  [
{
name :Marco,
city_id:45,
city:Rome
},
{
name:John b $ bcity_id:46,
city:London
},
{
name:Ann,
city_id :47,
city:New York
},
...
]

和聚合:

 aggs:{
城市:{
条款:{
字段:城市
}
}
}

给我一​​个这样的回复:

  {
aggregate:{
city:{
doc_count_error_upper_bound:0,
sum_other_doc_count:694,
buckets:[
{
key:Rome,
doc_count:15126
},
{
key:伦敦,
doc_count:11395
},
{
key:纽约,
doc_count:14836
},
...
]
},
..
}
}

我的问题是,我需要有 city_id 对我的聚合结果也是如此。我一直在阅读 here我不能拥有多字段术语聚合,但是我不需要通过两个字段进行聚合,而只是返回另外一个字段,它们对于每个术语字段(基本上是一个city / city_id对)来说都是一样的。我们可以创建一个名为 city_with_id 的字段,其值为像罗马; 45伦敦; 46等,并通过此字段进行聚合。对于我来说,这样做是有效的,因为我可以简单地将结果分解在我的后端,并获得我需要的ID,但这是最好的方法吗?

一种方法是使用 top_hits ,并使用源过滤功能仅返回 city_id ,如下例所示。
我不认为这样做会太低效果
您可以尝试使用索引来查看影响,然后再尝试 city_name_id



示例:

  post< index> ; / _ search 
{
size:0,
aggs:{
city:{
terms:{
字段:city
},
aggs:{
id:{
top_hits:{
_source:{
include:[
city_id
]
},
size:1
}
}
}
}
}
}

结果:

  {
key:London,
doc_count:2,
id:{
hits:{
total:2,
max_score:1,
:[
{
_index:country,
_type:city,
_id:2,
_score :1,
_source:{
city_id:46
}
}
]
}
}

{
key:纽约,
doc_count:1,
id:{
hits:{
total:1,
max_sco re:1,
hits:[
{
_index:country,
_type:city,
_id :3,
_score:1,
_source:{
city_id:47
}
}
]
}
}
},
{
key:Rome,
doc_count:1,
id
hits:{
total:1,
max_score:1,
hits:[
{
_index :country,
_type:city,
_id:1,
_sc矿石:1,
_source:{
city_id:45
}
}
]
}
}
}


I have an index with documents like the following:

[
    {
        "name": "Marco",
        "city_id": 45,
        "city": "Rome"
    },
    {
        "name": "John",
        "city_id": 46,
        "city": "London"
    },
    {
        "name": "Ann",
        "city_id": 47,
        "city": "New York"
    },
    ...
]

and an aggregation:

"aggs": {
    "city": {
        "terms": {
            "field": "city"
        }
    }
}

That gives me a response like this:

{
    "aggregations": {    
        "city": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 694,
            "buckets": [
                {
                    "key": "Rome",
                    "doc_count": 15126
                },
                {
                    "key": "London",
                    "doc_count": 11395
                },
                {
                    "key": "New York",
                    "doc_count": 14836
                },
                ...
          ]
        },
        ...
    }
}

My problem is that I need to have the city_id on my aggregation result as well. I have been reading here that I can't have multi-field terms aggregations, but I don't need to aggregate by two fields but simply return another field that will be always the same for each term field (basically a city/city_id pair). What would be the best way to achieve that without losing performance?

I can create a field named city_with_id with values like "Rome;45", "London;46", etc and make the aggregation by this field. For me it would work because I can simply split the results on my backend and get the the ID I need, but is it the best approach?

解决方案

One approach would be to use top_hits and use source filtering to return only the city_id as show in the example below. I don't think this would be prohibitively less performant You could try it on your indexes to see the impact before trying out the approach of city_name_id field specified in OP.

Example:

    post <index>/_search
    {
        "size" : 0,
        "aggs": {
            "city": {
                "terms": {
                    "field": "city"
                },
                "aggs" : {
                    "id" : {
                        "top_hits" : {
                            "_source": {
                                "include": [
                                    "city_id"
                                ]
                            },
                            "size" : 1
                        }
                    }
                }
            }
        }
    }

Results:

 {
               "key": "London",
               "doc_count": 2,
               "id": {
                  "hits": {
                     "total": 2,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "country",
                           "_type": "city",
                           "_id": "2",
                           "_score": 1,
                           "_source": {
                              "city_id": 46
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "New York",
               "doc_count": 1,
               "id": {
                  "hits": {
                     "total": 1,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "country",
                           "_type": "city",
                           "_id": "3",
                           "_score": 1,
                           "_source": {
                              "city_id": 47
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "Rome",
               "doc_count": 1,
               "id": {
                  "hits": {
                     "total": 1,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "country",
                           "_type": "city",
                           "_id": "1",
                           "_score": 1,
                           "_source": {
                              "city_id": 45
                           }
                        }
                     ]
                  }
               }
            }

这篇关于多字段术语聚合方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆