多个列的 pandas 群加权平均值 [英] Pandas Group Weighted Average of Multiple Columns

查看:107
本文介绍了多个列的 pandas 群加权平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有以下数据框:

>>> df=pd.DataFrame({'category':['a','a','b','b'],
... 'var1':np.random.randint(0,100,4),
... 'var2':np.random.randint(0,100,4),
... 'weights':np.random.randint(0,10,4)})
>>> df
  category  var1  var2  weights
0        a    37    36        7
1        a    47    20        1
2        b    33     7        6
3        b    16     6        8

我可以这样计算"var1"的加权平均值:

I can calculate the weighted average of a 'var1' as such:

>>> Grouped=df.groupby('category')
>>> GetWeightAvg=lambda g: np.average(g['var1'], weights=g['weights'])
>>> Grouped.apply(GetWeightAvg)
category
a    38.250000
b    23.285714
dtype: float64

但是,我想知道是否有一种方法可以编写函数并将其应用于分组对象,以便在应用函数时可以指定要计算的列(或两者).与其将var1写入我的函数中,我希望能够在应用该函数时指定.

However I am wondering if there is a way I can write my function and apply it to my grouped object such that I can specify when applying it, which column I want to calculate for (or both). Rather than have 'var1' written into my function, I'd like to be able to specify when applying the function.

就像我可以得到两个列的未加权平均值一样:

Just as I can get an unweighted average of both columns like this:

>>> Grouped[['var1','var2']].mean()
          var1  var2
category            
a         42.0  28.0
b         24.5   6.5

我想知道是否有一种并行的方法来处理加权平均值.

I'm wondering if there is a parallel way to do that with weighted averages.

推荐答案

您可以应用并返回两个平均值:

You can apply and return both averages:

In [11]: g.apply(lambda x: pd.Series(np.average(x[["var1", "var2"]], weights=x["weights"], axis=0), ["var1", "var2"]))
Out[11]:
               var1       var2
category
a         38.250000  34.000000
b         23.285714   6.428571

您可以将此函数编写得更简洁一些:

You could write this slightly cleaner as a function:

In [21]: def weighted(x, cols, w="weights"):
             return pd.Series(np.average(x[cols], weights=x[w], axis=0), cols)

In [22]: g.apply(weighted, ["var1", "var2"])
Out[22]:
               var1       var2
category
a         38.250000  34.000000
b         23.285714   6.428571

这篇关于多个列的 pandas 群加权平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆