具有NaN(缺失)值的pandas GroupBy列 [英] pandas GroupBy columns with NaN (missing) values

查看：651 发布时间：2020/11/21 0:08:57 python pandas group-by pandas-groupby nan

本文介绍了具有NaN(缺失)值的pandas GroupBy列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个DataFrame，在希望分组的列中有许多缺失值:

I have a DataFrame with many missing values in columns which I wish to groupby:

import pandas as pd
import numpy as np
df = pd.DataFrame({'a': ['1', '2', '3'], 'b': ['4', np.NaN, '6']})

In [4]: df.groupby('b').groups
Out[4]: {'4': [0], '6': [2]}

看到Pandas删除了具有NaN目标值的行. (我想包括这些行！)

see that Pandas has dropped the rows with NaN target values. (I want to include these rows!)

由于我需要许多这样的操作(许多col具有缺失的值)，并且使用的运算比仅中位数(通常是随机森林)更复杂，因此我想避免编写过于复杂的代码.

有什么建议吗?我应该为此编写一个函数还是有一个简单的解决方案?

Any suggestions? Should I write a function for this or is there a simple solution?

推荐答案

这是 :

GroupBy中的
NA个组将被自动排除.此行为与R

NA groups in GroupBy are automatically excluded. This behavior is consistent with R

一种解决方法是在进行分组方式(例如-1)之前使用占位符:

One workaround is to use a placeholder before doing the groupby (e.g. -1):

In [11]: df.fillna(-1)
Out[11]: 
   a   b
0  1   4
1  2  -1
2  3   6

In [12]: df.fillna(-1).groupby('b').sum()
Out[12]: 
    a
b    
-1  2
4   1
6   3

也就是说，这感觉很糟糕……也许应该有一个在groupby中包含NaN的选项(请参阅此github问题-使用相同的占位符hack).

That said, this feels pretty awful hack... perhaps there should be an option to include NaN in groupby (see this github issue - which uses the same placeholder hack).

但是，正如另一个答案所述，从熊猫1.1中您可以更好地控制此行为，现在可以使用dropna = False

这篇关于具有NaN(缺失)值的pandas GroupBy列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

具有NaN(缺失)值的pandas GroupBy列 [英] pandas GroupBy columns with NaN (missing) values

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

具有NaN(缺失)值的pandas GroupBy列 [英] pandas GroupBy columns with NaN (missing) values

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭