如何根据组值计数填充数据框中的缺失值? [英] How to fill missing values in a dataframe based on group value counts?

查看：168 发布时间：2020/6/24 18:30:32 python pandas dataframe pandas-groupby fillna

本文介绍了如何根据组值计数填充数据框中的缺失值?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有2列的Pandas DataFrame:Year(int)和Condition(string).在条件"列中，我有一个nan值，我想根据groupby操作的信息替换它.

I have a pandas DataFrame with 2 columns: Year(int) and Condition(string). In column Condition I have a nan value and I want to replace it based on information from groupby operation.

import pandas as pd 
import numpy as np

year = [2015, 2016, 2017, 2016, 2016, 2017, 2015, 2016, 2015, 2015]
cond = ["good", "good", "excellent", "good", 'excellent','excellent', np.nan, 'good','excellent', 'good']

X = pd.DataFrame({'year': year, 'condition': cond})
stat = X.groupby('year')['condition'].value_counts()

它给出:

print(X)
   year  condition
0  2015       good
1  2016       good
2  2017  excellent
3  2016       good
4  2016  excellent
5  2017  excellent
6  2015        NaN
7  2016       good
8  2015  excellent
9  2015       good

print(stat)
year  condition
2015  good         2
      excellent    1
2016  good         3
      excellent    1
2017  excellent    2

由于第六行的nan值等于year = 2015，从统计数据中我得到的是从2015年开始，最经常出现的是好"，所以我想用"good"值代替这个nan值.

As nan value in 6th row gets year = 2015 and from stat I get that from 2015 the most frequent is 'good' so I want to replace this nan value with 'good' value.

我已经尝试过fillna和.transform方法，但是它不起作用:(

I have tried with fillna and .transform method but it does not work :(

我将不胜感激.

推荐答案

我做了一些额外的改动，使stat作为将年份映射到其最高频率名称的字典(记入

I did a little extra transformation to get stat as a dictionary mapping the year to its highest frequency name (credit to this answer):

In[0]:
fill_dict = stat.unstack().idxmax(axis=1).to_dict()
fill_dict

Out[0]:
{2015: 'good', 2016: 'good', 2017: 'excellent'}

然后根据此词典将fillna与map结合使用(贷记到此答案):

Then use fillna with map based on this dictionary (credit to this answer):

In[0]:
X['condition'] = X['condition'].fillna(X['year'].map(fill_dict))
X

Out[0]:
   year  condition
0  2015       good
1  2016       good
2  2017  excellent
3  2016       good
4  2016  excellent
5  2017  excellent
6  2015       good
7  2016       good
8  2015  excellent
9  2015       good

这篇关于如何根据组值计数填充数据框中的缺失值?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何根据组值计数填充数据框中的缺失值? [英] How to fill missing values in a dataframe based on group value counts?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何根据组值计数填充数据框中的缺失值? [英] How to fill missing values in a dataframe based on group value counts?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭