Pandas/Numpy NaN 无比较 [英] Pandas/Numpy NaN None comparison

查看:70
本文介绍了Pandas/Numpy NaN 无比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Python Pandas 和 Numpy 中,为什么比较结果不同?

from pandas import 系列从 numpy 导入 NaN

NaN 不等于 NaN

<预><代码>>>>NaN == NaN错误的

但是NaN 在列表或元组中是

<预><代码>>>>[NaN] == [NaN], (NaN,) == (NaN,)(真,真)

虽然 SeriesNaN 又不相等:

<预><代码>>>>系列([N​​aN])==系列([N​​aN])0 错误数据类型:布尔

:

<预><代码>>>>无 == 无,[无] == [无](真,真)

虽然

<预><代码>>>>系列([无])==系列([无])0 错误数据类型:布尔

这个答案 解释了 NaN == NaN 一般为 False 的原因,但没有解释它在 python/pandas 集合中的行为.

解决方案

如解释 此处此处python文档检查序列相等性><块引用>

先比较元素标识,再比较元素仅对不同的元素执行.

因为 np.nannp.NaN 指的是同一个对象,即 (np.nan is np.nan is np.NaN) == True 这个等式持有 [np.nan] == [np.nan],但另一方面 float('nan') 函数创建一个新对象在每次调用时,[float('nan')] == [float('nan')]False.

Pandas/Numpy 没有这个问题:

<预><代码>>>>pd.Series([np.NaN]).eq(pd.Series([np.NaN]))[0], (pd.Series([np.NaN]) == pd.Series([np.NaN]))[0](假的,假的)

虽然特殊的 equals 方法对待 NaNs 在相同的位置与 equals 相同.

<预><代码>>>>pd.Series([np.NaN]).equals(pd.Series([np.NaN]))真的

None 被区别对待.numpy 认为它们相等:

<预><代码>>>>pd.Series([None, None]).values == (pd.Series([None, None])).values数组([真,真])

虽然 pandas 没有

<预><代码>>>>pd.Series([None, None]) == (pd.Series([None, None]))0 错误1 错误数据类型:布尔

还有 == 操作符和 eq 方法之间的不一致,讨论了 这里:

<预><代码>>>>pd.Series([None, None]).eq(pd.Series([None, None]))0 真1 真数据类型:布尔

pandas: 0.23.4 numpy: 1.15.0

上测试

In Python Pandas and Numpy, why is the comparison result different?

from pandas import Series
from numpy import NaN

NaN is not equal to NaN

>>> NaN == NaN
False

but NaN inside a list or tuple is

>>> [NaN] == [NaN], (NaN,) == (NaN,)
(True, True)

While Series with NaN are not equal again:

>>> Series([NaN]) == Series([NaN])
0    False
dtype: bool

And None:

>>> None == None, [None] == [None]
(True, True)

While

>>> Series([None]) == Series([None])
0    False
dtype: bool 

This answer explains the reasons for NaN == NaN being False in general, but does not explain its behaviour in python/pandas collections.

解决方案

As explained here, and here and in python docs to check sequence equality

element identity is compared first, and element comparison is performed only for distinct elements.

Because np.nan and np.NaN refer to the same object i.e. (np.nan is np.nan is np.NaN) == True this equality holds [np.nan] == [np.nan], but on the other hand float('nan') function creates a new object on every call so [float('nan')] == [float('nan')] is False.

Pandas/Numpy do not have this problem:

>>> pd.Series([np.NaN]).eq(pd.Series([np.NaN]))[0], (pd.Series([np.NaN]) == pd.Series([np.NaN]))[0]
(False, False)

Although special equals method treats NaNs in the same location as equals.

>>> pd.Series([np.NaN]).equals(pd.Series([np.NaN]))
True

None is treated differently. numpy considers them equal:

>>> pd.Series([None, None]).values == (pd.Series([None, None])).values
array([ True,  True])

While pandas does not

>>> pd.Series([None, None]) == (pd.Series([None, None]))
0    False
1    False
dtype: bool

Also there is an inconsistency between == operator and eq method, which is discussed here:

>>> pd.Series([None, None]).eq(pd.Series([None, None]))
0    True
1    True
dtype: bool

Tested on pandas: 0.23.4 numpy: 1.15.0

这篇关于Pandas/Numpy NaN 无比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆