如何在 pandas 中实现具有多个列的布尔搜索 [英] How to implement a Boolean search with multiple columns in pandas
问题描述
我有一个pandas df,并希望按照以下原则(以SQL术语)完成一些工作:
I have a pandas df and would like to accomplish something along these lines (in SQL terms):
SELECT * FROM df WHERE column1 = 'a' OR column2 = 'b' OR column3 = 'c' etc.
现在适用于一个列/值对:
Now this works, for one column/value pair:
foo = df.loc[df['column']==value]
但是,我不确定如何将其扩展为多个列/值对.
However, I'm not sure how to expand that to multiple column/value pairs.
- 为清楚起见,每列匹配一个不同的值.
推荐答案
由于运算符的优先级,您需要将多个条件括在括号中,并使用按位运算符和(&
)和或(|
)运算符:>
You need to enclose multiple conditions in braces due to operator precedence and use the bitwise and (&
) and or (|
) operators:
foo = df[(df['column1']==value) | (df['columns2'] == 'b') | (df['column3'] == 'c')]
如果使用and
或or
,则熊猫可能会抱怨这是模棱两可的.在那种情况下,我们是否要在条件中比较序列中的每个值还不清楚,如果只有1个或除1外的所有条件都匹配,这意味着什么.因此,您应该使用按位运算符或numpy np.all
或np.any
来指定匹配条件.
If you use and
or or
, then pandas is likely to moan that the comparison is ambiguous. In that case, it is unclear whether we are comparing every value in a series in the condition, and what does it mean if only 1 or all but 1 match the condition. That is why you should use the bitwise operators or the numpy np.all
or np.any
to specify the matching criteria.
还有查询方法: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.query.html
但是存在一些局限性,主要与列名和索引值之间可能含糊不清的问题有关.
but there are some limitations mainly to do with issues where there could be ambiguity between column names and index values.
这篇关于如何在 pandas 中实现具有多个列的布尔搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!