Pandas 中布尔索引的逻辑运算符 [英] Logical operators for Boolean indexing in Pandas

查看:46
本文介绍了Pandas 中布尔索引的逻辑运算符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 Pandas 中使用布尔索引.

I'm working with a Boolean index in Pandas.

问题是为什么要声明:

a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]

工作正常而

a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]

错误退出?

示例:

a = pd.DataFrame({'x':[1,1],'y':[10,20]})

In: a[(a['x']==1)&(a['y']==10)]
Out:    x   y
     0  1  10

In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous.     Use a.any() or a.all()

推荐答案

当你说

(a['x']==1) and (a['y']==10)

您隐含地要求 Python 将 (a['x']==1)(a['y']==10) 转换为布尔值.

You are implicitly asking Python to convert (a['x']==1) and (a['y']==10) to Boolean values.

NumPy 数组(长度大于 1)和 Pandas 对象(例如 Series)没有布尔值——换句话说,它们会引发

NumPy arrays (of length greater than 1) and Pandas objects such as Series do not have a Boolean value -- in other words, they raise

ValueError:数组的真值不明确.使用 a.empty、a.any() 或 a.all().

ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().

用作布尔值时.那是因为 不清楚何时应该是对还是错.一些用户可能会认为如果它们的长度不为零,则它们是 True,例如 Python 列表.其他人可能希望它只有在 所有 元素为真时才为真.如果任何元素为真,其他人可能希望它为真.

when used as a Boolean value. That's because it's unclear when it should be True or False. Some users might assume they are True if they have non-zero length, like a Python list. Others might desire for it to be True only if all its elements are True. Others might want it to be True if any of its elements are True.

因为有太多相互矛盾的期望,NumPy 和 Pandas 的设计者拒绝猜测,而是引发了 ValueError.

Because there are so many conflicting expectations, the designers of NumPy and Pandas refuse to guess, and instead raise a ValueError.

相反,您必须明确,通过调用 empty()all()any() 方法来指示哪种行为你想要.

Instead, you must be explicit, by calling the empty(), all() or any() method to indicate which behavior you desire.

但是,在这种情况下,您似乎不需要布尔求值,而是需要逐元素逻辑与.这就是 & 二元运算符的作用:

In this case, however, it looks like you do not want Boolean evaluation, you want element-wise logical-and. That is what the & binary operator performs:

(a['x']==1) & (a['y']==10)

返回一个布尔数组.

顺便说一下,正如 alexpmil 注释,括号是强制性的,因为 & 具有更高的 运算符优先级==.

By the way, as alexpmil notes, the parentheses are mandatory since & has a higher operator precedence than ==.

没有括号,a['x']==1 &a['y']==10 将被评估为 a['x'] == (1 & a['y']) == 10 反过来会等价于链式比较 (a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10).这是 Series and Series 形式的表达式.将 与两个系列一起使用将再次触发与上述相同的 ValueError.这就是为什么括号是强制性的.

Without the parentheses, a['x']==1 & a['y']==10 would be evaluated as a['x'] == (1 & a['y']) == 10 which would in turn be equivalent to the chained comparison (a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10). That is an expression of the form Series and Series. The use of and with two Series would again trigger the same ValueError as above. That's why the parentheses are mandatory.

这篇关于Pandas 中布尔索引的逻辑运算符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆