如何根据列值从DataFrame中选择行 [英] How to select rows from a DataFrame based on column values

查看:417
本文介绍了如何根据列值从DataFrame中选择行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何基于Pandas中某些列的值从 DataFrame 中选择行?

How can I select rows from a DataFrame based on values in some column in Pandas?

在SQL中,我将使用:

In SQL, I would use:

SELECT *
FROM table
WHERE colume_name = some_value

我尝试查看熊猫的文档,但没有立即找到答案。

I tried to look at Pandas' documentation, but I did not immediately find the answer.

推荐答案

要选择列值等于标量 some_value 的行,请使用 ==

To select rows whose column value equals a scalar, some_value, use ==:

df.loc[df['column_name'] == some_value]

要选择列值为可迭代的行, some_values ,请使用 isin

To select rows whose column value is in an iterable, some_values, use isin:

df.loc[df['column_name'].isin(some_values)]

& :

df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]

请注意括号。由于Python的运算符优先级规则 & 的绑定比< = > = 紧密。因此,最后一个示例中的括号是必需的。没有括号

Note the parentheses. Due to Python's operator precedence rules, & binds more tightly than <= and >=. Thus, the parentheses in the last example are necessary. Without the parentheses

df['column_name'] >= A & df['column_name'] <= B

被解析为

df['column_name'] >= (A & df['column_name']) <= B

会导致

which results in a Truth value of a Series is ambiguous error.

要选择列值不相等的行 some_value ,请使用!=

To select rows whose column value does not equal some_value, use !=:

df.loc[df['column_name'] != some_value]

isin 返回布尔序列,因此选择 some_values 中的值为 not 的行,使用取反布尔系列:

isin returns a boolean Series, so to select rows whose value is not in some_values, negate the boolean Series using ~:

df.loc[~df['column_name'].isin(some_values)]






例如,


For example,

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print(df.loc[df['A'] == 'foo'])

收益率

     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14






如果您有多个值包括在内,将它们放入
列表(或更普遍地说,是任何可迭代的列表),并使用 isin

print(df.loc[df['B'].isin(['one','three'])])

收益率

     A      B  C   D
0  foo    one  0   0
1  bar    one  1   2
3  bar  three  3   6
6  foo    one  6  12
7  foo  three  7  14






但是请注意,如果您愿意如此多次,首先使
建立索引,然后使用 df.loc 更为有效:

df = df.set_index(['B'])
print(df.loc['one'])

收益

       A  C   D
B              
one  foo  0   0
one  bar  1   2
one  foo  6  12

,或包含索引中的多个值,请使用 df.index.isin

or, to include multiple values from the index use df.index.isin:

df.loc[df.index.isin(['one','two'])]

收益

       A  C   D
B              
one  foo  0   0
one  bar  1   2
two  foo  2   4
two  foo  4   8
two  bar  5  10
one  foo  6  12

这篇关于如何根据列值从DataFrame中选择行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆