你如何用组的子集的平均值填充 NaN? [英] How do you fill NaN with mean of a subset of a group?
问题描述
我有一个数据框,其中包含 year
和 type
的一些值.我想用特定类型的当年值的平均值替换每年的所有 NaN 值.我想以最优雅的方式做到这一点.我正在处理大量数据,因此减少计算也会有好处.
I have a data frame with some values by year
and type
. I want to replace all NaN values in each year with the mean of values in that year with a specific type. I would like to do this in the most elegant way possible. I'm dealing with a lot of data so less computation would be good as well.
示例:
df =pd.DataFrame({'year':[1,1,1,2,2,2],
'type':[1,1,2,1,1,2],
'val':[np.nan,5,10,100,200,np.nan]})
我希望所有类型的 nan 都被替换为所有类型 1 的各自年份平均值.
I want ALL nan's regardless of their type to be replaced with their respective year mean of all type 1.
在本例中,第一行 NaN 应替换为 5
,最后一行应替换为 150.
In this example, the first row NaN should be replaced with 5
and the last row should be replaced with 150.
这只会填充类型 1 缺少的值,而不是类型 2
This only fills in values that are missing for type 1 , not type 2
df[val]=df[val].fillna(df.query('type==1').groupby('year')[val].transform('mean'))
推荐答案
mask
和 transform
df.fillna({'val': df.val.mask(df.type.ne(1)).groupby(df.year).transform('mean')})
year type val
0 1 1 5.0
1 1 1 5.0
2 1 2 10.0
3 2 1 100.0
4 2 1 200.0
5 2 2 150.0
这篇关于你如何用组的子集的平均值填充 NaN?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!