将逻辑值与以pandas/numpy的NaN进行比较 [英] Comparing logical values to NaN in pandas/numpy

查看:254
本文介绍了将逻辑值与以pandas/numpy的NaN进行比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对两个熊猫系列的布尔值进行按元素或运算.还包括np.nan.

I want to do an element-wise OR operation on two pandas Series of boolean values. np.nans are also included.

我尝试了三种方法,并且意识到根据方法,可以将表达式"np.nanFalse"评估为TrueFalsenp.nan.

I have tried three approaches and realized that the expression "np.nan or False" can be evaluted to True, False, and np.nan depending on the approach.

这些是我的示例系列:

series_1 = pd.Series([True, False, np.nan])
series_2 = pd.Series([False, False, False])

方法1

使用pandas的|运算符:

In [5]: series_1 | series_2
Out[5]: 
0     True
1    False
2    False
dtype: bool

方法2

使用numpy中的logical_or函数:

In [6]: np.logical_or(series_1, series_2)
Out[6]: 
0     True
1    False
2      NaN
dtype: object

方法3

我定义了logical_or的矢量化版本,应该对数组进行逐行评估:

Approach #3

I define a vectorized version of logical_or which is supposed to be evaluated row-by-row over the arrays:

@np.vectorize
def vectorized_or(a, b):
   return np.logical_or(a, b)

我在两个系列上使用vectorized_or并将其输出(它是一个numpy数组)转换为pandas系列:

I use vectorized_or on the two series and convert its output (which is a numpy array) into a pandas Series:

In [8]:  pd.Series(vectorized_or(series_1, series_2))
Out[8]: 
0     True
1    False
2     True
dtype: bool

问题

我想知道这些结果的原因. 此答案解释了np.logical_or并说np.logical_or(np.nan, False)True,但是为什么这仅在矢量化时才有效,而在矢量化时却不起作用在方法2中?以及如何解释方法1的结果?

Question

I am wondering about the reasons for these results.
This answer explains np.logical_or and says np.logical_or(np.nan, False) is be True but why does this only works when vectorized and not in Approach #2? And how can the results of Approach #1 be explained?

推荐答案

第一个区别:|np.bitwise_or.它说明了#1和#2之间的区别.

first difference : | is np.bitwise_or. it explains the difference between #1 and #2.

第二个区别:由于serie_1.dtype如果为object(非同类数据),则在前两种情况下逐行进行操作.

Second difference : since serie_1.dtype if object (non homogeneous data), operations are done row by row in the two first cases.

使用向量化(#3)时:

When using vectorize ( #3):

vectorized输出的数据类型是通过调用确定的 输入的第一个元素的功能.这可以避免 通过指定otypes参数.

The data type of the output of vectorized is determined by calling the function with the first element of the input. This can be avoided by specifying the otypes argument.

对于矢量化操作,请退出对象模式.数据首先根据第一个元素进行转换(在此处为bool(nan)True),然后进行操作.

For vectorized operations, you quit the object mode. data are first converted according to first element (bool here, bool(nan) is True) and the operations are done after.

这篇关于将逻辑值与以pandas/numpy的NaN进行比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆