pandas :通过加权平均值填充每组中的缺失值 [英] Pandas: filling missing values by weighted average in each group

查看:155
本文介绍了 pandas :通过加权平均值填充每组中的缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个dataFrame,其中值"列缺少值.我想通过每个名称"组中的加权平均值来填充缺失值.关于如何用每组中的简单平均值而不是加权平均值来填充缺失值的文章.非常感谢!

I have a dataFrame where 'value'column has missing values. I'd like to filling missing values by weighted average within each 'name' group. There was post on how to fill the missing values by simple average in each group but not weighted average. Thanks a lot!

df = pd.DataFrame({'value': [1, np.nan, 3, 2, 3, 1, 3, np.nan, np.nan],'weight':[3,1,1,2,1,2,2,1,1], 'name': ['A','A', 'A','B','B','B', 'C','C','C']})


   name  value  weight
0    A    1.0       3
1    A    NaN       1
2    A    3.0       1
3    B    2.0       2
4    B    3.0       1
5    B    1.0       2
6    C    3.0       2
7    C    NaN       1
8    C    NaN       1

我想在每个名称"组中用加权值填写"NaN",即

I'd like to fill in "NaN" with weighted value in each "name" group, i.e.

   name  value  weight
0    A    1.0       3
1    A    1.5       1
2    A    3.0       1
3    B    2.0       2
4    B    3.0       1
5    B    1.0       2
6    C    3.0       2
7    C    3.0       1
8    C    3.0       1

推荐答案

您可以按name对数据帧进行分组,并使用fillna方法用加权平均值填充缺失的值,可以使用np.averageweights参数:

You can group data frame by name, and use fillna method to fill the missing values with weighted average which can calculated with np.average with weights parameter:

df['value'] = (df.groupby('name', group_keys=False)
                 .apply(lambda g: g.value.fillna(np.average(g.dropna().value, weights=g.dropna().weight))))

df
#name   value   weight
#0  A    1.0    3
#1  A    1.5    1
#2  A    3.0    1
#3  B    2.0    2
#4  B    3.0    1
#5  B    1.0    2
#6  C    3.0    2
#7  C    3.0    1
#8  C    3.0    1


要减少混淆,请定义一个 fillValue 函数:


To make this less convoluted, define a fillValue function:

import numpy as np
import pandas as pd

def fillValue(g):
    gNotNull = g.dropna()
    wtAvg = np.average(gNotNull.value, weights=gNotNull.weight)
    return g.value.fillna(wtAvg)

df['value'] = df.groupby('name', group_keys=False).apply(fillValue)

这篇关于 pandas :通过加权平均值填充每组中的缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆