Elastic Search中嵌套聚合的加权平均值 [英] Weighted Average for nested aggregation in Elastic Search

查看:115
本文介绍了Elastic Search中嵌套聚合的加权平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过聚合嵌套列表来获取加权平均值.每个文档都有一个学生的详细信息,并且每个学生的科目各不相同,每个科目的权重也不同.

I am trying to obtain the weighted average by aggregating a nested list. Each document has details of a single student, and the subjects vary across each student and each subject has different weights.

我正在尝试按主题计算加权平均数.

I am trying to calculate the weighted average subject-wise.

我的文档的格式-

[{'class': '10th',
 'id': '1',
'subject': [{'marks': 60, 'name': 's1', 'weight': 30},
         {'marks': 80, 'name': 's2', 'weight': 70}]},
{'class': '11th',
 'id': '2',
'subject': [{'marks': 43, 'name': 's10', 'weight': 40},
         {'marks': 54, 'name': 's20', 'weight': 60}]},
{'class': '10th',
 'id': '3',
'subject': [{'marks': 43, 'name': 's1', 'weight': 20},
         {'marks': 54, 'name': 's20', 'weight': 80}]},
{'class': '10th',
 'id': '4',
'subject': [{'marks': 69, 'name': 's10', 'weight': 30},
         {'marks': 45, 'name': 's2', 'weight': 70}]}]

这里s1,s10,s2,s20是主题.对于给定的课程,说第十",我正在尝试汇总加权平均值.

Here s1,s10,s2,s20 are the subjects. For a given class, say "10th" I am trying to aggregate the weighted average.

我进行的查询是

GET students_try/_search
{
 "query": {
"match": {
  "class": "10th"
}
 },
"aggs": {
"subjects": {
  "nested": {
    "path": "subject"
  },
  "aggs": {
    "subjects": {
      "terms": {
        "field": "subject.name"
      },
      "aggs": {
        "avg_score": {
          "avg": {
            "field": "subject.marks"
          }
        },
        "weighted_grade": {
          "weighted_avg": {
            "value": {
              "field": "subject.marks"
            },
            "weight": {
              "field": "subject.weight"
            }
          }
        }
      }
    }
  }
}
  },
 "size": 0
}

我得到的错误是-

{u'error': {u'col': 211,
        u'line': 1,
        u'reason': u'Unknown BaseAggregationBuilder [weighted_avg]',
        u'root_cause': [{u'col': 211,
                         u'line': 1,
                         u'reason': u'Unknown BaseAggregationBuilder [weighted_avg]',
                         u'type': u'unknown_named_object_exception'}],
        u'type': u'unknown_named_object_exception'},
 u'status': 400}

我不确定是什么原因导致了错误.

I am not sure what is causing the error.

推荐答案

是的,Nishant提到的加权平均值仅出现在此链接详细介绍6.4版本

Yes the weighted average as mentioned by Nishant only appears post 6.4 as mentioned in the section A few others in this link detailing 6.4 release

但是我使用

However I've come up with the below script using Bucket Script Aggregation which calculates the weighted avg for each and every bucket :

POST <your_index_name>/_search
{
  "size": 0,
  "query": {
    "match": {
      "class": "10th"
    }
  },
  "aggs": {
    "subjects": {
      "nested": {
        "path": "subject"
      },
      "aggs": {
        "subjects": {
          "terms": {
            "field": "subject.name.keyword"
          },
          "aggs": {
            "avg_score": {
              "avg": {
                "field": "subject.marks"
              }
            },
            "sum_score":{
              "sum_productOfMarksAndWeight": {
                "script": "doc['subject.marks'].value * doc['subject.weight'].value"
              }
            },
            "sum_weights": {
              "sum": {
                "field": "subject.weight"
              }
            },
            "weighted_avg":{
              "bucket_script": {
                "buckets_path": {
                  "sumScore": "sum_productOfMarksAndWeight",
                  "sumWeights": "sum_weights"
                },
                "script": "params.sumScore/params.sumWeights"
              }
            }
          }             
        }
      }
    }
  }
}

如果仔细看一下以上汇总,对于每个存储区,我都使用总和,然后我我们使用这两个聚合来计算加权聚合.

If you look at the above aggregation carefully, for every bucket I've calculated the sum of weights and sum of product of weights and marks using Sum Aggregation and then I've used these two aggregations to calculate the weighted aggregation.

以下是您的回复显示方式.请注意,在汇总结果中还会看到权重之和权重与标记乘积之和.

Below is how your response appears. Notice that there is a caveat that you'd also see the sum of weights and sum of product of weights and marks in the aggregation result.

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "subjects": {
      "doc_count": 6,
      "subjects": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "s1",
            "doc_count": 2,
            "sum_weights": {
              "value": 50
            },
            "sum_productOfMarksAndWeight": {
              "value": 2660
            },
            "avg_score": {
              "value": 51.5
            },
            "weighted_avg": {
              "value": 53.2
            }
          },
          {
            "key": "s2",
            "doc_count": 2,
            "sum_weights": {
              "value": 140
            },
            "sum_productOfMarksAndWeight": {
              "value": 8750
            },
            "avg_score": {
              "value": 62.5
            },
            "weighted_avg": {
              "value": 62.5
            }
          },
          {
            "key": "s10",
            "doc_count": 1,
            "sum_weights": {
              "value": 30
            },
            "sum_productOfMarksAndWeight": {
              "value": 2070
            },
            "avg_score": {
              "value": 69
            },
            "weighted_avg": {
              "value": 69
            }
          },
          {
            "key": "s20",
            "doc_count": 1,
            "sum_weights": {
              "value": 80
            },
            "sum_productOfMarksAndWeight": {
              "value": 4320
            },
            "avg_score": {
              "value": 54
            },
            "weighted_avg": {
              "value": 54
            }
          }
        ]
      }
    }
  }
}

我希望这会有所帮助,如果没有,请告诉我,如果您认为这可以解决您的需求,请继续接受此答案;-)

I hope this helps, let me know if it doesn't and if you think this solves what you are looking for, please go ahead and accept this answer ;-)

这篇关于Elastic Search中嵌套聚合的加权平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆