如何在Pandas中使用分组模式替换缺失值? [英] How to replace missing values with group mode in Pandas?

查看：92 发布时间：2020/5/9 23:13:58 python pandas missing-data imputation

本文介绍了如何在Pandas中使用分组模式替换缺失值?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我遵循

I follow the method in this post to replace missing values with the group mode, but encounter the "IndexError: index out of bounds".

 df['SIC'] = df.groupby('CIK').SIC.apply(lambda x: x.fillna(x.mode()[0]))

我想这可能是因为某些组缺少所有值并且没有模式.有办法解决这个问题吗?谢谢！

I guess this is probably because some groups have all missing values and do not have a mode. Is there a way to get around this? Thank you!

推荐答案

mode相当困难，因为实际上并没有商定的解决关系的方法.另外，它通常非常慢.这是一种快速"的方法.我们将定义一个函数来计算每个组的模式，然后用map填充缺失的值.我们不会遇到缺少组的问题，尽管对于关系，我们可以随意选择排序时首先出现的模式值:

mode is quite difficult, given that there really isn't any agreed upon way to deal with ties. Plus it's typically very slow. Here's one way that will be "fast". We'll define a function that calculates the mode for each group, then we can fill the missing values afterwards with a map. We don't run into issues with missing groups, though for ties we arbitrarily choose the modal value that comes first when sorted:

def fast_mode(df, key_cols, value_col):
    """ 
    Calculate a column mode, by group, ignoring null values. 

    Parameters
    ----------
    df : pandas.DataFrame
        DataFrame over which to calcualate the mode. 
    key_cols : list of str
        Columns to groupby for calculation of mode.
    value_col : str
        Column for which to calculate the mode. 

    Return
    ------ 
    pandas.DataFrame
        One row for the mode of value_col per key_cols group. If ties, 
        returns the one which is sorted first. 
    """
    return (df.groupby(key_cols + [value_col]).size() 
              .to_frame('counts').reset_index() 
              .sort_values('counts', ascending=False) 
              .drop_duplicates(subset=key_cols)).drop(columns='counts')

样本数据`df`:

   CIK  SIK
0    C  2.0
1    C  1.0
2    B  NaN
3    B  3.0
4    A  NaN
5    A  3.0
6    C  NaN
7    B  NaN
8    C  1.0
9    A  2.0
10   D  NaN
11   D  NaN
12   D  NaN

代码:

df.loc[df.SIK.isnull(), 'SIK'] = df.CIK.map(fast_mode(df, ['CIK'], 'SIK').set_index('CIK').SIK)

输出`df`:

   CIK  SIK
0    C  2.0
1    C  1.0
2    B  3.0
3    B  3.0
4    A  2.0
5    A  3.0
6    C  1.0
7    B  3.0
8    C  1.0
9    A  2.0
10   D  NaN
11   D  NaN
12   D  NaN

这篇关于如何在Pandas中使用分组模式替换缺失值?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Pandas中使用分组模式替换缺失值? [英] How to replace missing values with group mode in Pandas?

问题描述

推荐答案

样本数据`df`:

代码:

输出`df`:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Pandas中使用分组模式替换缺失值? [英] How to replace missing values with group mode in Pandas?

问题描述

推荐答案

样本数据df:

代码:

输出df:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

样本数据`df`:

输出`df`:

登录关闭