是否有可能计算出“不同总和”?和“明显平均值”在elasticsearch中? [英] Is it possible to compute "distinct sum" and "distinct average" in elasticsearch?

查看:110
本文介绍了是否有可能计算出“不同总和”?和“明显平均值”在elasticsearch中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在Elasticsearch中计算不同的平均值?我有一些非正规化的数据,例如:

How can I calculate a "distinct average" in elasticsearch? I have some denormalized data like this:

{ "record_id" : "100", "cost" : 42 }
{ "record_id" : "200", "cost" : 67 }
{ "record_id" : "200", "cost" : 67 }
{ "record_id" : "200", "cost" : 67 }
{ "record_id" : "400", "cost" : 11 }
{ "record_id" : "400", "cost" : 11 }
{ "record_id" : "500", "cost" : 10 }
{ "record_id" : "600", "cost" : 99 }

请注意,给定的 record_id的成本始终是相同的。

Notice how the "cost" is always the same for a given "record_id".

因此,使用上述数据:


  1. 如何获取 cost字段的平均值但由 record_id区分?
    结果将为(42 + 67 + 11 + 10 + 99)/5=45.8

  1. How can I get the AVERAGE values for the "cost" field but DISTINCT by "record_id"? Result would be (42+67+11+10+99)/5=45.8

如何获取费用字段,但按 record_id区分吗?
结果将为42 + 67 + 11 + 10 + 99 = 229

How can I get the SUM values for the "cost" field but DISTINCT by "record_id"? Result would be 42+67+11+10+99=229

我可以使用组合术语聚合,然后是第一和平均子聚合?我在想这样的事情: elasticsearch计算唯一值的平均值

Could I use a combination of a "terms" aggregation and then "first" and "average" sub-aggregations? I'm thinking something like this: elasticsearch calculate average of unique values

推荐答案

它不适用于条款 aggs。使用无痛脚本可以实现以下目的:

It's not going to work with terms aggs. Here's what's possible using painless scripts:

索引编制-您的实际映射可能与生成的默认值不同(特别是 .keyword rec_id )上的部分:

Indexing -- your actual mapping may differ from the generated default (esp the .keyword part on the rec_id):

POST _bulk
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"100","cost":42}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"200","cost":67}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"200","cost":67}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"200","cost":67}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"400","cost":11}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"400","cost":11}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"500","cost":10}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"600","cost":99}

然后汇总

GET uniques/_search
{
  "size": 0,
  "aggs": {
    "terms": {
      "scripted_metric": {
        "init_script": "state.id_map = [:]; state.sum = 0.0; state.elem_count = 0.0;",
        "map_script": """
          def id = doc['record_id.keyword'].value;
          if (!state.id_map.containsKey(id)) {
            state.id_map[id] = true;
            state.elem_count++;
            state.sum += doc['cost'].value;
          }
        """,
        "combine_script": """
            def sum = state.sum;
            def avg = sum / state.elem_count;

            def stats = [:];
            stats.sum = sum;
            stats.avg = avg;

            return stats
        """,
        "reduce_script": "return states"
      }
    }
  }
}

并产生

...
"aggregations" : {
    "terms" : {
      "value" : [
        {
          "avg" : 45.8,
          "sum" : 229.0
        }
      ]
    }
  }

这篇关于是否有可能计算出“不同总和”?和“明显平均值”在elasticsearch中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆