Elasticsearch-计算重复值和唯一值 [英] Elasticsearch - Count duplicated and unique values

查看:245
本文介绍了Elasticsearch-计算重复值和唯一值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下json

[
 {"firstname": "john", "lastname": "doe"},
 {"firstname": "john", "lastname": "smith"},
 {"firstname": "jane", "lastname": "smith"},
 {"firstname": "jane", "lastname": "doe"},
 {"firstname": "joe", "lastname": "smith"},
 {"firstname": "joe", "lastname": "doe"},
 {"firstname": "steve", "lastname": "smith"},
 {"firstname": "jack", "lastname": "doe"}
]

我想获得重复的名字

重复计数3

不可重复的名字的计数

非重复计数2

我试图计算存储桶的数量,但是似乎要计算所有存储桶是重复的还是不重复的

I tried to count the number of buckets, but it seems to count all buckets whether it's duplicate or non-duplicate

GET mynames/_search
{
"aggs" : {
    "name_count" : {
        "terms" : {
            "field" : "firstname.keyword",
            "min_doc_count": 2
        }
    },
"count":{
  "cardinality": {
    "field": "firstname.keyword"
  }
}
}

推荐答案

好,我在这里使用了几种聚合.以下是我使用过的列表.列表的顺序是聚合的执行顺序.

Well I've made use of several aggregations here. The below are the lists which I've used. The order of the list is the execution order of the aggregation.

重复

  • Terms Aggregation
  • Stats Bucket Aggregation

非重复

  • 术语聚合
    • Terms Aggregation
      • Bucket Selector (As a sub aggregation)
      POST <your_index_name>/_search
      {  
         "size":0,
         "aggs":{  
            "duplicate_aggs":{  
               "terms":{  
                  "field":"firstname.keyword",
                  "min_doc_count":2
               }
            },
            "duplicate_bucketcount":{  
               "stats_bucket":{  
                  "buckets_path":"duplicate_aggs._count"
               }
            },
            "nonduplicate_aggs":{  
               "terms":{  
                  "field":"firstname.keyword"
               },
               "aggs":{  
                  "equal_one":{  
                     "bucket_selector":{  
                        "buckets_path":{  
                           "count":"_count"
                        },
                        "script":"params.count == 1"
                     }
                  }
               }
            },
            "nonduplicate_bucketcount":{  
               "sum_bucket":{  
                  "buckets_path":"nonduplicate_aggs._count"
               }
            }
         }
      }
      

      响应

      {
        "took": 10,
        "timed_out": false,
        "_shards": {
          "total": 5,
          "successful": 5,
          "skipped": 0,
          "failed": 0
        },
        "hits": {
          "total": 8,
          "max_score": 0,
          "hits": []
        },
        "aggregations": {
          "duplicate_aggs": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "jane",
                "doc_count": 2
              },
              {
                "key": "joe",
                "doc_count": 2
              },
              {
                "key": "john",
                "doc_count": 2
              }
            ]
          },
          "nonduplicate_aggs": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "jack",
                "doc_count": 1
              },
              {
                "key": "steve",
                "doc_count": 1
              }
            ]
          },
          "duplicate_bucketcount": {
            "count": 3,
            "min": 2,
            "max": 2,
            "avg": 2,
            "sum": 6
          },
          "nonduplicate_bucketcount": {
            "value": 2
          }
        }
      }
      

      请注意,在上面的响应中,我们有duplicate_bucketcount.count键,其值3是将显示存储桶计数的值,该值是重复的键数.

      Notice that in the above response, we have duplicate_bucketcount.count key whose value 3 is what would display the bucket count which is the number of keys which are duplicates.

      让我知道是否有帮助!

      这篇关于Elasticsearch-计算重复值和唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆