pandas 根据布尔条件选择行和列 [英] Pandas select rows and columns based on boolean condition

查看:129
本文介绍了 pandas 根据布尔条件选择行和列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约50列和> 100行的pandas数据框.我想选择'col_x''col_y'列,其中'col_z' < m.有没有一种简单的方法可以执行此操作,类似于df[df['col3'] < m]df[['colx','coly']]但组合在一起?

I have a pandas dataframe with about 50 columns and >100 rows. I want to select columns 'col_x', 'col_y' where 'col_z' < m. Is there a simple way to do this, similar to df[df['col3'] < m] and df[['colx','coly']] but combined?

推荐答案

让我们解决您的问题.您想

Let's break down your problem. You want to

  1. 根据某些布尔条件过滤行
  2. 您要从结果中选择列的子集.

首先,您需要的条件是-

For the first point, the condition you'd need is -

df["col_z"] < m

对于第二个要求,您想指定所需的列列表-

For the second requirement, you'd want to specify the list of columns that you need -

["col_x", "col_y"]

您如何将这两者结合起来以产生熊猫的预期输出?最直接的方法是使用 loc -

How would you combine these two to produce an expected output with pandas? The most straightforward way is using loc -

df.loc[df["col_z"] < m, ["col_x", "col_y"]]

第一个参数选择行,第二个参数选择列.

The first argument selects rows, and the second argument selects columns.

有关loc

More About loc

根据关系代数运算-选择投影来考虑这一点.如果您来自SQL世界,那将是一个相对应的等效项.以SQL语法执行的上述操作应如下所示-

Think of this in terms of the relational algebra operations - selection and projection. If you're from the SQL world, this would be a relatable equivalent. The above operation, in SQL syntax, would look like this -

SELECT col_x, col_y     # projection on columns
FROM df
WHERE col_z < m         # selection on rows

pandas loc允许您指定索引标签以选择行.例如,如果您有一个数据框-

pandas loc allows you to specify index labels for selecting rows. For example, if you have a dataframe -

   col_x  col_y
a      1      4
b      2      5
c      3      6

要选择索引accol_x,请使用-

To select index a, and c, and col_x you'd use -

df.loc[['a', 'c'], ['col_x']]

   col_x
a      1
c      3

或者,用于通过布尔条件进行选择(如原始问题所要求的那样,使用一系列bool值的数组/数组),其中col_x中的所有值都是奇数-

Alternatively, for selecting by a boolean condition (using a series/array of bool values, as your original question asks), where all values in col_x are odd -

df.loc[(df.col_x % 2).ne(0), ['col_y']]

   col_y
a      4
c      6

有关详细信息,df.col_x % 2计算相对于2的每个值的模数.然后,ne(0)会将值与0进行比较,如果不是,则返回True(所有奇数均按以下方式选择).这就是该表达式的结果-

For details, df.col_x % 2 computes the modulus of each value with respect to 2. The ne(0) will then compare the value to 0, and return True if it isn't (all odd numbers are selected like this). Here's what that expression results in -

(df.col_x % 2).ne(0)

a     True
b    False
c     True
Name: col_x, dtype: bool


进一步阅读

  • 10 Minutes to Pandas - Selection by Label
  • Selection with .loc in python
  • Loc vs. iloc vs. ix vs. at vs. iat?

这篇关于 pandas 根据布尔条件选择行和列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆