如何根据组值计数填充数据框中的缺失值? [英] How to fill missing values in a dataframe based on group value counts?
问题描述
我有一个带有2列的Pandas DataFrame:Year(int)和Condition(string).在条件"列中,我有一个nan值,我想根据groupby操作的信息替换它.
I have a pandas DataFrame with 2 columns: Year(int) and Condition(string). In column Condition I have a nan value and I want to replace it based on information from groupby operation.
import pandas as pd
import numpy as np
year = [2015, 2016, 2017, 2016, 2016, 2017, 2015, 2016, 2015, 2015]
cond = ["good", "good", "excellent", "good", 'excellent','excellent', np.nan, 'good','excellent', 'good']
X = pd.DataFrame({'year': year, 'condition': cond})
stat = X.groupby('year')['condition'].value_counts()
它给出:
print(X)
year condition
0 2015 good
1 2016 good
2 2017 excellent
3 2016 good
4 2016 excellent
5 2017 excellent
6 2015 NaN
7 2016 good
8 2015 excellent
9 2015 good
print(stat)
year condition
2015 good 2
excellent 1
2016 good 3
excellent 1
2017 excellent 2
由于第六行的nan值等于year = 2015,从统计数据中我得到的是从2015年开始,最经常出现的是好",所以我想用"good"值代替这个nan值.
As nan value in 6th row gets year = 2015 and from stat I get that from 2015 the most frequent is 'good' so I want to replace this nan value with 'good' value.
我已经尝试过fillna和.transform方法,但是它不起作用:(
I have tried with fillna and .transform method but it does not work :(
我将不胜感激.
推荐答案
我做了一些额外的改动,使stat
作为将年份映射到其最高频率名称的字典(记入
I did a little extra transformation to get stat
as a dictionary mapping the year to its highest frequency name (credit to this answer):
In[0]:
fill_dict = stat.unstack().idxmax(axis=1).to_dict()
fill_dict
Out[0]:
{2015: 'good', 2016: 'good', 2017: 'excellent'}
然后根据此词典将fillna
与map
结合使用(贷记到此答案):
Then use fillna
with map
based on this dictionary (credit to this answer):
In[0]:
X['condition'] = X['condition'].fillna(X['year'].map(fill_dict))
X
Out[0]:
year condition
0 2015 good
1 2016 good
2 2017 excellent
3 2016 good
4 2016 excellent
5 2017 excellent
6 2015 good
7 2016 good
8 2015 excellent
9 2015 good
这篇关于如何根据组值计数填充数据框中的缺失值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!