ElasticSearch组并分发到桶 [英] ElasticSearch group and distribute to buckets

查看:106
本文介绍了ElasticSearch组并分发到桶的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是弹性搜索的新手,但似乎没有一种简单的方法来创建聚合,并且一旦前一次聚合完成,就将其分配给桶。
例如,我有以下数据集,我想创建4个桶和组配置文件,这些配置文件在桶之间具有特定的交易数量。

I am quite new to elasticsearch but it seems that there is no easy way to create aggregation and distribute doc_count to buckets once previous aggregation is done. For example I have below set of data and I would like to create 4 buckets and group profiles that have specific numbers of transactions between the buckets.

总数应将配置文件分配到下一个桶,其中每个桶轮廓列出一个配置文件可能拥有的最小和最大交易数。

Total number of profiles should be distributed to below buckets, where each bucket outlines min and max number of transactions that one profile could have.

具有0-1交易的配置文件数量

number of profiles that has 0-1 transaction

具有2-5个交易的配置文件数量

number of profiles that has 2-5 transactions

具有6-20个交易的配置文件数量

number of profiles that has 6-20 transactions

拥有20多个交易的个人资料数量

number of profiles that has 20+ transactions

[
  {
    "profileId": "AVdiZnj6YuzD-vV0m9lx",
    "transactionId": "sdsfsdghfd"
  },
  {
    "profileId": "SRGDDUUDaasaddsaf",
    "transactionId": "asdadscfdvdvd"
  },
  {
    "profileId": "AVdiZnj6YuzD-vV0m9lx",
    "transactionId": "sdsacfsfcsafcs"
  }
]



Below request would show number of transactions per each profile but additional bucket grouping is required in order to group profiles to respective buckets using doc_cont.

    {   "size":0,
        "aggs" : {
            "profileTransactions" : {
                "terms" : {
                    "field" : "profileId"
                }
            }
        }
    }
    "buckets": [
                {
                   "key": "AVdiZnj6YuzD-vV0m9lx",
                   "doc_count": 2
                },
      {
                   "key": "SRGDDUUDaasaddsaf",
                   "doc_count": 1
                }

                ]

任何想法?

推荐答案

您可以在管道桶选择器聚合价值计数聚合,因为对数字字段检查了桶聚合。此查询将需要 ES 2.x 版本。

You could do additional grouping with the help of pipeline bucket selector aggregation. The value count aggregation is used since bucket aggregation is checked against a numeric field. This query will require ES 2.x version.

{
  "size": 0,
  "aggs": {
    "unique_profileId0": {
      "terms": {
        "field": "profileId"
      },
      "aggs": {
        "total_profile_count": {
          "value_count": {
            "field": "profileId"
          }
        },
        "range_0-1_bucket": {
          "bucket_selector": {
            "buckets_path": {
              "totalTransaction": "total_profile_count"
            },
            "script": "totalTransaction < 2"
          }
        }
      }
    },
    "unique_profileId1": {
      "terms": {
        "field": "profileId"
      },
      "aggs": {
        "total_profile_count": {
          "value_count": {
            "field": "profileId"
          }
        },
        "range_2-5_bucket": {
          "bucket_selector": {
            "buckets_path": {
              "totalTransaction": "total_profile_count"
            },
            "script": "totalTransaction >= 2 && totalTransaction <= 5"
          }
        }
      }
    },
    "unique_profileId2": {
      "terms": {
        "field": "profileId"
      },
      "aggs": {
        "total_profile_count": {
          "value_count": {
            "field": "profileId"
          }
        },
        "range_6-20_bucket": {
          "bucket_selector": {
            "buckets_path": {
              "totalTransaction": "total_profile_count"
            },
            "script": "totalTransaction >= 6 && totalTransaction <= 20"
          }
        }
      }
    },
    "unique_profileId3": {
      "terms": {
        "field": "profileId"
      },
      "aggs": {
        "total_profile_count": {
          "value_count": {
            "field": "profileId"
          }
        },
        "range_20_more_bucket": {
          "bucket_selector": {
            "buckets_path": {
              "totalTransaction": "total_profile_count"
            },
            "script": "totalTransaction > 20"
          }
        }
      }
    }
  }
}

您需要启用动态脚本,以使其工作,在YML文件中添加以下两行

You need to enable dynamic scripting for this to work, add following two lines to the YML file

script.inline: on
script.indexed: on

重新启动每个节点。

希望它有帮助!

这篇关于ElasticSearch组并分发到桶的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆