pandas :如何用groupby的平均值填充空值? [英] Pandas: How to fill null values with mean of a groupby?
问题描述
我有一个数据集,其中有些数据看起来像这样:
I have a dataset will some missing data that looks like this:
id category value
1 A NaN
2 B NaN
3 A 10.5
4 C NaN
5 A 2.0
6 B 1.0
我需要填写null才能在模型中使用数据.类别第一次出现时为NULL.我想做的方法是针对类别A
和B
这样的情况,这些情况具有多个值,用该类别的平均值替换空值.对于仅出现一次的类别C
,只需填写其余数据的平均值即可.
I need to fill in the nulls to use the data in a model. Every time a category occurs for the first time it is NULL. The way I want to do is for cases like category A
and B
that have more than one value replace the nulls with the average of that category. And for category C
with only single occurrence just fill in the average of the rest of the data.
我知道我可以在C
之类的情况下简单地做到这一点,以获取所有行的平均值,但是我一直试图对A和B进行按类别的方法并替换空值.
I know that I can simply do this for cases like C
to get the average of all the rows but I'm stuck trying to do the categorywise means for A and B and replacing the nulls.
df['value'] = df['value'].fillna(df['value'].mean())
我需要最终的df像这样
I need the final df to be like this
id category value
1 A 6.25
2 B 1.0
3 A 10.5
4 C 4.15
5 A 2.0
6 B 1.0
推荐答案
我认为您可以使用 fillna
与NaN的列的所有值的"noreferrer"> mean
:
I think you can use groupby
and apply
fillna
with mean
. Then get NaN
if some category has only NaN
values, so use mean
of all values of column for filling NaN
:
df.value = df.groupby('category')['value'].apply(lambda x: x.fillna(x.mean()))
df.value = df.value.fillna(df.value.mean())
print (df)
id category value
0 1 A 6.25
1 2 B 1.00
2 3 A 10.50
3 4 C 4.15
4 5 A 2.00
5 6 B 1.00
这篇关于 pandas :如何用groupby的平均值填充空值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!