pandas :如何用groupby的平均值填充空值? [英] Pandas: How to fill null values with mean of a groupby?

查看:332
本文介绍了 pandas :如何用groupby的平均值填充空值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其中有些数据看起来像这样:

I have a dataset will some missing data that looks like this:

id    category     value
1     A            NaN
2     B            NaN
3     A            10.5
4     C            NaN
5     A            2.0
6     B            1.0

我需要填写null才能在模型中使用数据.类别第一次出现时为NULL.我想做的方法是针对类别AB这样的情况,这些情况具有多个值,用该类别的平均值替换空值.对于仅出现一次的类别C,只需填写其余数据的平均值即可.

I need to fill in the nulls to use the data in a model. Every time a category occurs for the first time it is NULL. The way I want to do is for cases like category A and B that have more than one value replace the nulls with the average of that category. And for category C with only single occurrence just fill in the average of the rest of the data.

我知道我可以在C之类的情况下简单地做到这一点,以获取所有行的平均值,但是我一直试图对A和B进行按类别的方法并替换空值.

I know that I can simply do this for cases like C to get the average of all the rows but I'm stuck trying to do the categorywise means for A and B and replacing the nulls.

df['value'] = df['value'].fillna(df['value'].mean()) 

我需要最终的df像这样

I need the final df to be like this

id    category     value
1     A            6.25
2     B            1.0
3     A            10.5
4     C            4.15
5     A            2.0
6     B            1.0

推荐答案

我认为您可以使用NaN的列的所有值的"noreferrer"> mean :

I think you can use groupby and apply fillna with mean. Then get NaN if some category has only NaN values, so use mean of all values of column for filling NaN:

df.value = df.groupby('category')['value'].apply(lambda x: x.fillna(x.mean()))
df.value = df.value.fillna(df.value.mean())
print (df)
   id category  value
0   1        A   6.25
1   2        B   1.00
2   3        A  10.50
3   4        C   4.15
4   5        A   2.00
5   6        B   1.00

这篇关于 pandas :如何用groupby的平均值填充空值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆