合并多个聚合的结果 [英] Combine results of multiple aggregations

查看:45
本文介绍了合并多个聚合的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有电影索引,其中每个文档都具有以下结构:

I have movies index in which each document has this structure :

{
                    "color": "Color",
                    "director_name": "Sam Raimi",
                    "actor_2_name": "James Franco",
                    "movie_title": "Spider-Man 2",
                    "actor_3_name" : "Brad Pitt",
                    "actor_1_name": "J.K. Simmons"
}

我需要计算与每个演员对应的电影数量(演员可以在actor_1_name或actor_2_name或actor_3_name字段中)

I need to do calculate number of movies corresponding to each actor (actor can be in both actor_1_name or actor_2_name or actor_3_name field)

这三个字段的映射为:

"mappings": {
            "properties": {
                "actor_1_name": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "actor_2_name": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                 "actor_3_name": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            }
       }
}

有没有一种方法可以汇总结果,该结果可以合并所有3个actor字段的术语并给出一个整体.

Is there a way I can aggregated result which can combine terms from all 3 actor fields and give a single aggreagation.

当前,我正在为每个actor字段创建单独的聚合,并通过我的JAVA代码将这些不同的聚合合并为一个.

Currently I am creating separate aggregation for each actor field and through my JAVA code combine these different aggregations into one.

通过创建其他聚合来搜索查询:

Search Query by creating different aggregation :

{
    "aggs" : {
        "actor1_count" : {
            "terms" : {
                "field" : "actor_1_name.keyword"
            }
        },
        "actor2_count" : {
            "terms" : {
                "field" : "actor_2_name.keyword"
            }
        },
        "actor3_count" : {
            "terms" : {
                "field" : "actor_3_name.keyword"
            }
        }
    }
}

结果

样品结果为:

"aggregations": {
"actor1_count": {

            "buckets": [

                {
                    "key": "Johnny Depp",
                    "doc_count": 2
                }
            ]
},

"actor2_count": {

            "buckets": [
                {
                    "key": "Johnny Depp",
                    "doc_count": 1                   }
      ]
},
"actor3_count": {

            "buckets": [

                {
                    "key": "Johnny Depp",
                    "doc_count": 3
                }

           ]
    }
 }

因此,是否有可能代替创建不同的聚合,而是可以通过Elasticsearch将所有3个聚合的结果合并在一起.

So, is it possible instead of creating different aggregation , I can combine result of all 3 aggregation in one aggreation through Elasticsearch.

基本上这是我想要的:

"aggregations": {
    "actor_count": {

                "buckets": [

                    {
                        "key": "Johnny Depp",
                        "doc_count": 6
                    }
                ]
    }
}

( Johnny Depp doc_count应该显示所有3个字段actor_1_name,actor_2_name,actor_3_name的总和)

(Johnny Depp doc_count should show sum from all 3 field actor_1_name, actor_2_name, actor_3_name wherever it is present)

我已经尝试过脚本,但是它确实可以正常工作.

I have tried though script but it dint worked correctly .

{
    "aggregations": {
        "name": {
            "terms": {
                "script": "doc['actor_1_name.keyword'].value + ' ' +  doc['actor_2_name.keyword'].value + ' ' + doc['actor_2_name.keyword'].value"
            }
        }
    }
}

它将演员姓名组合在一起,然后给出结果.

It is combining actor names and then giving result .

"buckets": [

                {
                    "key": "Steve Buscemi Adam Sandler Adam Sandler",
                    "doc_count": 6
                },
                {
                    "key": "Leonard Nimoy Nichelle Nichols Nichelle Nichols",
                    "doc_count": 4
                }

            ]

推荐答案

如果没有 terms ,这将无法正常工作.我认为必须使用 scripted_metric :

This is not going to work w/ terms. Gotta resort to scripted_metric, I think:

GET actors/_search
{
  "size": 0,
  "aggs": {
    "merged_actors": {
      "scripted_metric": {
        "init_script": "state.actors_map=[:]",
        "map_script": """
          def actor_keys = ['actor_1_name', 'actor_2_name', 'actor_3_name'];

          for (def key : actor_keys) {

            def actor_name = doc[key + '.keyword'].value;

            if (state.actors_map.containsKey(actor_name)) {
              state.actors_map[actor_name] += 1;
            } else {
              state.actors_map[actor_name] = 1; 
            }
          }
        """,
        "combine_script": "return state",
        "reduce_script": "return states"
      }
    }
  }
}

屈服

...
"aggregations" : {
    "merged_actors" : {
      "value" : [
        {
          "actors_map" : {
            "Brad Pitt" : 5,
            "J.K. Simmons" : 1,
            "James Franco" : 3
          }
        }
      ]
    }
  }

这篇关于合并多个聚合的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆