pandas 根据布尔条件选择行和列 [英] Pandas select rows and columns based on boolean condition
问题描述
我有一个大约50列和> 100行的pandas数据框.我想选择'col_x'
,'col_y'
列,其中'col_z' < m
.有没有一种简单的方法可以执行此操作,类似于df[df['col3'] < m]
和df[['colx','coly']]
但组合在一起?
I have a pandas dataframe with about 50 columns and >100 rows. I want to select columns 'col_x'
, 'col_y'
where 'col_z' < m
. Is there a simple way to do this, similar to df[df['col3'] < m]
and df[['colx','coly']]
but combined?
推荐答案
让我们解决您的问题.您想
Let's break down your problem. You want to
- 根据某些布尔条件过滤行
- 您要从结果中选择列的子集.
首先,您需要的条件是-
For the first point, the condition you'd need is -
df["col_z"] < m
对于第二个要求,您想指定所需的列列表-
For the second requirement, you'd want to specify the list of columns that you need -
["col_x", "col_y"]
您如何将这两者结合起来以产生熊猫的预期输出?最直接的方法是使用 loc
-
How would you combine these two to produce an expected output with pandas? The most straightforward way is using loc
-
df.loc[df["col_z"] < m, ["col_x", "col_y"]]
第一个参数选择行,第二个参数选择列.
The first argument selects rows, and the second argument selects columns.
有关loc
More About loc
根据关系代数运算-选择和投影来考虑这一点.如果您来自SQL世界,那将是一个相对应的等效项.以SQL语法执行的上述操作应如下所示-
Think of this in terms of the relational algebra operations - selection and projection. If you're from the SQL world, this would be a relatable equivalent. The above operation, in SQL syntax, would look like this -
SELECT col_x, col_y # projection on columns
FROM df
WHERE col_z < m # selection on rows
pandas
loc允许您指定索引标签以选择行.例如,如果您有一个数据框-
pandas
loc allows you to specify index labels for selecting rows. For example, if you have a dataframe -
col_x col_y
a 1 4
b 2 5
c 3 6
要选择索引a
,c
和col_x
,请使用-
To select index a
, and c
, and col_x
you'd use -
df.loc[['a', 'c'], ['col_x']]
col_x
a 1
c 3
或者,用于通过布尔条件进行选择(如原始问题所要求的那样,使用一系列bool
值的数组/数组),其中col_x
中的所有值都是奇数-
Alternatively, for selecting by a boolean condition (using a series/array of bool
values, as your original question asks), where all values in col_x
are odd -
df.loc[(df.col_x % 2).ne(0), ['col_y']]
col_y
a 4
c 6
有关详细信息,df.col_x % 2
计算相对于2
的每个值的模数.然后,ne(0)
会将值与0
进行比较,如果不是,则返回True
(所有奇数均按以下方式选择).这就是该表达式的结果-
For details, df.col_x % 2
computes the modulus of each value with respect to 2
. The ne(0)
will then compare the value to 0
, and return True
if it isn't (all odd numbers are selected like this). Here's what that expression results in -
(df.col_x % 2).ne(0)
a True
b False
c True
Name: col_x, dtype: bool
进一步阅读
- 10 Minutes to Pandas - Selection by Label
- Selection with .loc in python
- Loc vs. iloc vs. ix vs. at vs. iat?
这篇关于 pandas 根据布尔条件选择行和列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!