如何在 pandas 中实现具有多个列的布尔搜索 [英] How to implement a Boolean search with multiple columns in pandas

查看:95
本文介绍了如何在 pandas 中实现具有多个列的布尔搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个pandas df,并希望按照以下原则(以SQL术语)完成一些工作:

I have a pandas df and would like to accomplish something along these lines (in SQL terms):

SELECT * FROM df WHERE column1 = 'a' OR column2 = 'b' OR column3 = 'c' etc.

现在适用于一个列/值对:

Now this works, for one column/value pair:

foo = df.loc[df['column']==value]

但是,我不确定如何将其扩展为多个列/值对.

However, I'm not sure how to expand that to multiple column/value pairs.

  • 为清楚起见,每列匹配一个不同的值.

推荐答案

由于运算符的优先级,您需要将多个条件括在括号中,并使用按位运算符和(&)和或(|)运算符:

You need to enclose multiple conditions in braces due to operator precedence and use the bitwise and (&) and or (|) operators:

foo = df[(df['column1']==value) | (df['columns2'] == 'b') | (df['column3'] == 'c')]

如果使用andor,则熊猫可能会抱怨这是模棱两可的.在那种情况下,我们是否要在条件中比较序列中的每个值还不清楚,如果只有1个或除1外的所有条件都匹配,这意味着什么.因此,您应该使用按位运算符或numpy np.allnp.any来指定匹配条件.

If you use and or or, then pandas is likely to moan that the comparison is ambiguous. In that case, it is unclear whether we are comparing every value in a series in the condition, and what does it mean if only 1 or all but 1 match the condition. That is why you should use the bitwise operators or the numpy np.all or np.any to specify the matching criteria.

还有查询方法: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.query.html

但是存在一些局限性,主要与列名和索引值之间可能含糊不清的问题有关.

but there are some limitations mainly to do with issues where there could be ambiguity between column names and index values.

这篇关于如何在 pandas 中实现具有多个列的布尔搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆