如何在Pandas中使用分组模式替换缺失值? [英] How to replace missing values with group mode in Pandas?
问题描述
I follow the method in this post to replace missing values with the group mode, but encounter the "IndexError: index out of bounds".
df['SIC'] = df.groupby('CIK').SIC.apply(lambda x: x.fillna(x.mode()[0]))
我想这可能是因为某些组缺少所有值并且没有模式.有办法解决这个问题吗?谢谢!
I guess this is probably because some groups have all missing values and do not have a mode. Is there a way to get around this? Thank you!
推荐答案
mode
相当困难,因为实际上并没有商定的解决关系的方法.另外,它通常非常慢.这是一种快速"的方法.我们将定义一个函数来计算每个组的模式,然后用map
填充缺失的值.我们不会遇到缺少组的问题,尽管对于关系,我们可以随意选择排序时首先出现的模式值:
mode
is quite difficult, given that there really isn't any agreed upon way to deal with ties. Plus it's typically very slow. Here's one way that will be "fast". We'll define a function that calculates the mode for each group, then we can fill the missing values afterwards with a map
. We don't run into issues with missing groups, though for ties we arbitrarily choose the modal value that comes first when sorted:
def fast_mode(df, key_cols, value_col):
"""
Calculate a column mode, by group, ignoring null values.
Parameters
----------
df : pandas.DataFrame
DataFrame over which to calcualate the mode.
key_cols : list of str
Columns to groupby for calculation of mode.
value_col : str
Column for which to calculate the mode.
Return
------
pandas.DataFrame
One row for the mode of value_col per key_cols group. If ties,
returns the one which is sorted first.
"""
return (df.groupby(key_cols + [value_col]).size()
.to_frame('counts').reset_index()
.sort_values('counts', ascending=False)
.drop_duplicates(subset=key_cols)).drop(columns='counts')
样本数据df
:
CIK SIK
0 C 2.0
1 C 1.0
2 B NaN
3 B 3.0
4 A NaN
5 A 3.0
6 C NaN
7 B NaN
8 C 1.0
9 A 2.0
10 D NaN
11 D NaN
12 D NaN
代码:
df.loc[df.SIK.isnull(), 'SIK'] = df.CIK.map(fast_mode(df, ['CIK'], 'SIK').set_index('CIK').SIK)
输出df
:
CIK SIK
0 C 2.0
1 C 1.0
2 B 3.0
3 B 3.0
4 A 2.0
5 A 3.0
6 C 1.0
7 B 3.0
8 C 1.0
9 A 2.0
10 D NaN
11 D NaN
12 D NaN
这篇关于如何在Pandas中使用分组模式替换缺失值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!