带有和不带有括号的 pandas 逻辑和运算符会产生不同的结果 [英] pandas logical and operator with and without brackets produces different results

查看:93
本文介绍了带有和不带有括号的 pandas 逻辑和运算符会产生不同的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚注意到了这一点:

I have just noticed this:

df[df.condition1 & df.condition2]
df[(df.condition1) & (df.condition2)]

这两行的输出为什么不同?

Why does the output of these two lines differ?

我无法分享确切的数据,但我会尝试提供尽可能多的细节:

I cannot share the exact data but I am gonna try to provide as much detail as I can:

df[df.col1 == False & df.col2.isnull()] # returns 33 rows and the rule `df.col2.isnull()` is not in effect
df[(df.col1 == False) & (df.col2.isnull())] # returns 29 rows and both conditions are applied correctly 


感谢@jezrael和@ayhan,这是发生了什么,让我使用@jezael提供的示例:


Thanks to @jezrael and @ayhan, here is what happened, and let me use the example provided by @jezael:

df = pd.DataFrame({'col1':[True, False, False, False],
                   'col2':[4, np.nan, np.nan, 1]})

print (df)
    col1  col2
0   True   4.0
1  False   NaN
2  False   NaN
3  False   1.0

如果我们看一下第3行:

If we take a look at row 3:

    col1  col2
3  False   1.0

以及我写条件的方式:

df.col1 == False & df.col2.isnull() # is equivalent to False == False & False

因为&符号的优先级高于==,所以没有方括号False == False & False等效于:

Because the & sign has higher priority than ==, without brackets False == False & False is equivalent of:

False == (False & False)
print(False == (False & False)) # prints True

带括号:

print((False == False) & False) # prints False

我认为用数字来说明这个问题要容易一些:

I think it is a bit easier to illustrate this problem with numbers:

print(5 == 5 & 1) # prints False, because 5 & 1 returns 1 and 5==1 returns False
print(5 == (5 & 1)) # prints False, same reason as above
print((5 == 5) & 1) # prints 1, because 5 == 5 returns True, and True & 1 returns 1

如此吸取的教训:总是加括号!!!

So lessons learned: always add brackets!!!

我希望我可以将答案分给@jezrael和@ayhan:(

I wish I can split the answer points to both @jezrael and @ayhan :(

推荐答案

df[condition1 & condition2]df[(condition1) & (condition2)]之间没有区别.当您编写表达式并且运算符&具有优先权时,会出现区别:

There is no difference between df[condition1 & condition2] and df[(condition1) & (condition2)]. The difference arises when you write an expression and the operator & takes precedence:

df = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=list('abc'))    
df
Out: 
   a  b  c
0  5  0  3
1  3  7  9
2  3  5  2
3  4  7  6
4  8  8  1

condition1 = df['a'] > 3
condition2 = df['b'] < 5

df[condition1 & condition2]
Out: 
   a  b  c
0  5  0  3

df[(condition1) & (condition2)]
Out: 
   a  b  c
0  5  0  3

但是,如果您这样输入,则会看到错误消息:

However, if you type it like this you'll see an error:

df[df['a'] > 3 & df['b'] < 5]
Traceback (most recent call last):

  File "<ipython-input-7-9d4fd21246ca>", line 1, in <module>
    df[df['a'] > 3 & df['b'] < 5]

  File "/home/ayhan/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 892, in __nonzero__
    .format(self.__class__.__name__))

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

这是因为3 & df['b']首先被评估(在您的示例中,这对应于False & df.col2.isnull()).因此,您需要将条件分组在括号中:

This is because 3 & df['b'] is evaluated first (this corresponds to False & df.col2.isnull() in your example). So you need to group the conditions in parentheses:

df[(df['a'] > 3) & (df['b'] < 5)]
Out[8]: 
   a  b  c
0  5  0  3

这篇关于带有和不带有括号的 pandas 逻辑和运算符会产生不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆