pandas 中布尔索引的逻辑运算符 [英] Logical operators for boolean indexing in Pandas

查看:106
本文介绍了 pandas 中布尔索引的逻辑运算符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Pandas中使用布尔值索引. 问题是为什么要声明:

I'm working with boolean index in Pandas. The question is why the statement:

a[(a['some_column']==some_number) & (a['some_other_column']==some_other_number)]

工作正常,而

a[(a['some_column']==some_number) and (a['some_other_column']==some_other_number)]

错误退出?

示例:

a=pd.DataFrame({'x':[1,1],'y':[10,20]})

In: a[(a['x']==1)&(a['y']==10)]
Out:    x   y
     0  1  10

In: a[(a['x']==1) and (a['y']==10)]
Out: ValueError: The truth value of an array with more than one element is ambiguous.     Use a.any() or a.all()

推荐答案

当你说

(a['x']==1) and (a['y']==10)

您暗中要求Python将(a['x']==1)(a['y']==10)转换为布尔值.

You are implicitly asking Python to convert (a['x']==1) and (a['y']==10) to boolean values.

NumPy数组(长度大于1)和Pandas对象(例如Series)没有布尔值-换句话说,它们引发

NumPy arrays (of length greater than 1) and Pandas objects such as Series do not have a boolean value -- in other words, they raise

ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().

用作布尔值时.那是因为它的不清楚何时应该使用是或否.如果某些用户的长度非零,则可能会认为它们为True,例如Python列表.其他人可能只希望所有元素为True时才将其为True.如果其他元素的任何为真,则其他人可能希望它为真.

when used as a boolean value. That's because its unclear when it should be True or False. Some users might assume they are True if they have non-zero length, like a Python list. Others might desire for it to be True only if all its elements are True. Others might want it to be True if any of its elements are True.

由于期望值如此之多,NumPy和Pandas的设计师拒绝猜测,而是提出了ValueError.

Because there are so many conflicting expectations, the designers of NumPy and Pandas refuse to guess, and instead raise a ValueError.

相反,您必须通过调用empty()all()any()方法来明确表示您想要的行为.

Instead, you must be explicit, by calling the empty(), all() or any() method to indicate which behavior you desire.

但是,在这种情况下,您似乎不希望布尔值求值,而是想要 element-wise 逻辑和.这就是&二进制运算符执行的操作:

In this case, however, it looks like you do not want boolean evaluation, you want element-wise logical-and. That is what the & binary operator performs:

(a['x']==1) & (a['y']==10)

返回一个布尔数组.

顺便说一句,如 alexpmil注释, 括号是必需的,因为&具有更高的运算符优先级==. 如果没有括号,则将a['x']==1 & a['y']==10评估为a['x'] == (1 & a['y']) == 10,这又将等效于链式比较(a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10).这是形式为Series and Series的表达式. 将and与两个Series一起使用将再次触发与上述相同的ValueError.这就是为什么必须使用括号的原因.

By the way, as alexpmil notes, the parentheses are mandatory since & has a higher operator precedence than ==. Without the parentheses, a['x']==1 & a['y']==10 would be evaluated as a['x'] == (1 & a['y']) == 10 which would in turn be equivalent to the chained comparison (a['x'] == (1 & a['y'])) and ((1 & a['y']) == 10). That is an expression of the form Series and Series. The use of and with two Series would again trigger the same ValueError as above. That's why the parentheses are mandatory.

这篇关于 pandas 中布尔索引的逻辑运算符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆