多字段术语聚合方法 [英] Multi-field terms aggregation approach

查看：100 发布时间：2017/8/7 3:35:22 elasticsearch

本文介绍了多字段术语聚合方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

  [
 {
name ：Marco，
city_id：45，
city：Rome
}，
 {
name：John b $ bcity_id：46，
city：London
}，
 {
name：Ann，
city_id ：47，
city：New York
}，
 ... 
]

和聚合：

 aggs：{
城市：{
条款：{
字段：城市
} 
} 
}

给我一个这样的回复：

  {
aggregate：{
city：{
doc_count_error_upper_bound：0，
sum_other_doc_count：694，
buckets：[
 {
key：Rome，
 doc_count：15126 
}，
 {
key：伦敦，
doc_count：11395 
}，
 {
key：纽约，
doc_count：14836 
}，
 ... 
] 
}，
 .. 
} 
}

我的问题是，我需要有 city_id 对我的聚合结果也是如此。我一直在阅读 here我不能拥有多字段术语聚合，但是我不需要通过两个字段进行聚合，而只是返回另外一个字段，它们对于每个术语字段（基本上是一个city / city_id对）来说都是一样的。我们可以创建一个名为 city_with_id 的字段，其值为像罗马; 45，伦敦; 46等，并通过此字段进行聚合。对于我来说，这样做是有效的，因为我可以简单地将结果分解在我的后端，并获得我需要的ID，但这是最好的方法吗？

解决方案

一种方法是使用 top_hits ，并使用源过滤功能仅返回 city_id ，如下例所示。
我不认为这样做会太低效果
您可以尝试使用索引来查看影响，然后再尝试 city_name_id

示例：

  post< index> ; / _ search 
 {
size：0，
aggs：{
city：{
terms：{
字段：city
}，
aggs：{
id：{
top_hits：{
_source：{
include：[
city_id
] 
}，
size：1 
} 
} 
} 
} 
} 
}

结果：

  {
key：London，
 doc_count：2，
id：{
hits：{
total：2，
max_score：1，
 ：[
 {
_index：country，
_type：city，
_id：2，
_score ：1，
_source：{
city_id：46 
} 
} 
] 
} 
} 
 
 {
key：纽约，
doc_count：1，
id：{
hits：{
total：1，
max_sco re：1，
hits：[
 {
_index：country，
_type：city，
_id ：3，
_score：1，
_source：{
city_id：47 
} 
} 
] 
} 
} 
}，
 {
key：Rome，
doc_count：1，
id 
hits：{
total：1，
max_score：1，
hits：[
 {
_index ：country，
_type：city，
_id：1，
_sc矿石：1，
_source：{
city_id：45 
} 
} 
] 
} 
} 
}

I have an index with documents like the following:

[
    {
        "name": "Marco",
        "city_id": 45,
        "city": "Rome"
    },
    {
        "name": "John",
        "city_id": 46,
        "city": "London"
    },
    {
        "name": "Ann",
        "city_id": 47,
        "city": "New York"
    },
    ...
]

and an aggregation:

"aggs": {
    "city": {
        "terms": {
            "field": "city"
        }
    }
}

That gives me a response like this:

{
    "aggregations": {    
        "city": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 694,
            "buckets": [
                {
                    "key": "Rome",
                    "doc_count": 15126
                },
                {
                    "key": "London",
                    "doc_count": 11395
                },
                {
                    "key": "New York",
                    "doc_count": 14836
                },
                ...
          ]
        },
        ...
    }
}

My problem is that I need to have the city_id on my aggregation result as well. I have been reading here that I can't have multi-field terms aggregations, but I don't need to aggregate by two fields but simply return another field that will be always the same for each term field (basically a city/city_id pair). What would be the best way to achieve that without losing performance?

I can create a field named city_with_id with values like "Rome;45", "London;46", etc and make the aggregation by this field. For me it would work because I can simply split the results on my backend and get the the ID I need, but is it the best approach?

解决方案

One approach would be to use top_hits and use source filtering to return only the city_id as show in the example below. I don't think this would be prohibitively less performant You could try it on your indexes to see the impact before trying out the approach of city_name_id field specified in OP.

Example:

    post <index>/_search
    {
        "size" : 0,
        "aggs": {
            "city": {
                "terms": {
                    "field": "city"
                },
                "aggs" : {
                    "id" : {
                        "top_hits" : {
                            "_source": {
                                "include": [
                                    "city_id"
                                ]
                            },
                            "size" : 1
                        }
                    }
                }
            }
        }
    }

Results:

 {
               "key": "London",
               "doc_count": 2,
               "id": {
                  "hits": {
                     "total": 2,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "country",
                           "_type": "city",
                           "_id": "2",
                           "_score": 1,
                           "_source": {
                              "city_id": 46
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "New York",
               "doc_count": 1,
               "id": {
                  "hits": {
                     "total": 1,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "country",
                           "_type": "city",
                           "_id": "3",
                           "_score": 1,
                           "_source": {
                              "city_id": 47
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": "Rome",
               "doc_count": 1,
               "id": {
                  "hits": {
                     "total": 1,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "country",
                           "_type": "city",
                           "_id": "1",
                           "_score": 1,
                           "_source": {
                              "city_id": 45
                           }
                        }
                     ]
                  }
               }
            }

这篇关于多字段术语聚合方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

多字段术语聚合方法 [英] Multi-field terms aggregation approach

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

多字段术语聚合方法 [英] Multi-field terms aggregation approach

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭