Python - 在 Pandas groupby 中取加权平均值,同时忽略 NaN [英] Python - Take weighted average inside Pandas groupby while ignoring NaN

查看:218
本文介绍了Python - 在 Pandas groupby 中取加权平均值,同时忽略 NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要按日期对 Pandas 数据框进行分组,然后取给定值的加权平均值.以下是当前如何使用边距值作为示例(并且在出现 NaN 值之前它完美地工作):

I need to group a Pandas dataframe by date, and then take a weighted average of given values. Here's how it's currently done using the margin value as an example (and it works perfectly until there are NaN values):

df = orders.copy()
# Create new columns as required
df['margin_WA'] = df['net_margin'].astype(float)    # original data as str or Decimal

def group_wa():
        return lambda num: np.average(num, weights=df.loc[num.index, 'order_amount'])

agg_func = {
        'margin_WA': group_wa(),    # agg_func includes WAs for other elements
    }

result = df.groupby('order_date').agg(agg_func)

result['margin_WA'] = result['margin_WA'].astype(str)

'net_margin' 字段包含 NaN 值的情况下,WA 设置为 NaN.创建新列时,我似乎无法 dropna() 或通过 pd.notnull 过滤,而且我不知道在哪里创建屏蔽数组避免将 NaN 传递给 group_wa 函数(像这里建议的那样).在这种情况下,我如何忽略 NaN?

In the case where 'net_margin' fields contain NaN values, the WA is set to NaN. I can't seem to be able to dropna() or filtering by pd.notnull when creating new columns, and I don't know where to create a masked array to avoid passing NaN to the group_wa function (like suggested here). How do I ignore NaN in this case?

推荐答案

我认为一个简单的解决方案是在 groupby/aggregate 之前删除缺失值,例如:

I think a simple solution is to drop the missing values before you groupby/aggregate like:

 result = df.dropna(subset='margin_WA').groupby('order_date').agg(agg_func)

在这种情况下,不会将包含缺失的索引传递给您的 group_wa 函数.

In this case, no indices containing missings are passed to your group_wa function.

另一种方法是将 dropna 移动到您的聚合函数中,例如:

Another approach is to move the dropna into your aggregating function like:

def group_wa(series):
    dropped = series.dropna()
    return np.average(dropped, weights=df.loc[dropped.index, 'order_amount'])

agg_func = {'margin_WA': group_wa}
result = df.groupby('order_date').agg(agg_func)

这篇关于Python - 在 Pandas groupby 中取加权平均值,同时忽略 NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆