使用多列的Pandas DataFrame聚合函数 [英] Pandas DataFrame aggregate function using multiple columns

查看:261
本文介绍了使用多列的Pandas DataFrame聚合函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一种方法可以像DataFrame.agg方法中所使用的那样编写聚合函数,该函数可以访问多个要聚合的数据列?典型的用例是加权平均值,加权标准偏差函数.

Is there a way to write an aggregation function as is used in DataFrame.agg method, that would have access to more than one column of the data that is being aggregated? Typical use cases would be weighted average, weighted standard deviation funcs.

我希望能够写类似的东西

I would like to be able to write something like

def wAvg(c, w):
    return ((c * w).sum() / w.sum())

df = DataFrame(....) # df has columns c and w, i want weighted average
                     # of c using w as weight.
df.aggregate ({"c": wAvg}) # and somehow tell it to use w column as weights ...

推荐答案

是;使用.apply(...)函数,该函数将在每个子DataFrame上调用.例如:

Yes; use the .apply(...) function, which will be called on each sub-DataFrame. For example:

grouped = df.groupby(keys)

def wavg(group):
    d = group['data']
    w = group['weights']
    return (d * w).sum() / w.sum()

grouped.apply(wavg)

这篇关于使用多列的Pandas DataFrame聚合函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆