Python比较忽略nan [英] Python comparison ignoring nan

查看:460
本文介绍了Python比较忽略nan的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然nan == nan始终为False,但在许多情况下,人们希望将它们等同对待,这被包含在

While nan == nan is always False, in many cases people want to treat them as equal, and this is enshrined in pandas.DataFrame.equals:

在同一位置的NaN被认为是相等的.

NaNs in the same location are considered equal.

我当然可以写

def equalp(x, y):
    return (x == y) or (math.isnan(x) and math.isnan(y))

但是,对于非数字的[float("nan")]isnan barfs这样的容器,这将失败(因此复杂性会增加).

However, this will fail on containers like [float("nan")] and isnan barfs on non-numbers (so the complexity increases).

那么,人们如何比较可能包含nan的复杂Python对象?

So, what do people do to compare complex Python objects which may contain nan?

PS .动机:比较熊猫DataFrame中的两行时,我会

PS. Motivation: when comparing two rows in a pandas DataFrame, I would convert them into dicts and compare dicts element-wise.

PPS .当我说"比较"时,我在想 diff ,而不是 equalp .

PPS. When I say "compare", I am thinking diff, not equalp.

推荐答案

假设您有一个具有nan值的数据框:

Suppose you have a data-frame with nan values:

In [10]: df = pd.DataFrame(np.random.randint(0, 20, (10, 10)).astype(float), columns=["c%d"%d for d in range(10)])

In [10]: df.where(np.random.randint(0,2, df.shape).astype(bool), np.nan, inplace=True)

In [10]: df
Out[10]:
     c0    c1    c2    c3    c4    c5    c6    c7   c8    c9
0   NaN   6.0  14.0   NaN   5.0   NaN   2.0  12.0  3.0   7.0
1   NaN   6.0   5.0  17.0   NaN   NaN  13.0   NaN  NaN   NaN
2   NaN  17.0   NaN   8.0   6.0   NaN   NaN  13.0  NaN   NaN
3   3.0   NaN   NaN  15.0   NaN   8.0   3.0   NaN  3.0   NaN
4   7.0   8.0   7.0   NaN   9.0  19.0   NaN   0.0  NaN  11.0
5   NaN   NaN  14.0   2.0   NaN   NaN   0.0   NaN  NaN   8.0
6   3.0  13.0   NaN   NaN   NaN   NaN   NaN  12.0  3.0   NaN
7  13.0  14.0   NaN   5.0  13.0   NaN  18.0   6.0  NaN   5.0
8   3.0   9.0  14.0  19.0  11.0   NaN   NaN   NaN  NaN   5.0
9   3.0  17.0   NaN   NaN   0.0   NaN  11.0   NaN  NaN   0.0

您想比较行,例如行0和8.然后只需使用fillna并进行矢量化比较:

And you want to compare rows, say, row 0 and 8. Then just use fillna and do vectorized comparison:

In [12]: df.iloc[0,:].fillna(0) != df.iloc[8,:].fillna(0)
Out[12]:
c0     True
c1     True
c2    False
c3     True
c4     True
c5    False
c6     True
c7     True
c8     True
c9     True
dtype: bool

如果您只想知道哪些列不同,则可以使用生成的布尔数组来索引各列:

You can use the resulting boolean array to index into the columns, if you just want to know which columns are different:

In [14]: df.columns[df.iloc[0,:].fillna(0) != df.iloc[8,:].fillna(0)]
Out[14]: Index(['c0', 'c1', 'c3', 'c4', 'c6', 'c7', 'c8', 'c9'], dtype='object')

这篇关于Python比较忽略nan的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆