如何根据列值从DataFrame中选择行 [英] How to select rows from a DataFrame based on column values
问题描述
如何基于Pandas中某些列的值从 DataFrame
中选择行?
How can I select rows from a DataFrame
based on values in some column in Pandas?
在SQL中,我将使用:
In SQL, I would use:
SELECT *
FROM table
WHERE colume_name = some_value
我尝试查看熊猫的文档,但没有立即找到答案。
I tried to look at Pandas' documentation, but I did not immediately find the answer.
推荐答案
要选择列值等于标量 some_value
的行,请使用 ==
:
To select rows whose column value equals a scalar, some_value
, use ==
:
df.loc[df['column_name'] == some_value]
要选择列值为可迭代的行, some_values
,请使用 isin
:
To select rows whose column value is in an iterable, some_values
, use isin
:
df.loc[df['column_name'].isin(some_values)]
用&
:
df.loc[(df['column_name'] >= A) & (df['column_name'] <= B)]
请注意括号。由于Python的运算符优先级规则, &
的绑定比< =
和> =
紧密。因此,最后一个示例中的括号是必需的。没有括号
Note the parentheses. Due to Python's operator precedence rules, &
binds more tightly than <=
and >=
. Thus, the parentheses in the last example are necessary. Without the parentheses
df['column_name'] >= A & df['column_name'] <= B
被解析为
df['column_name'] >= (A & df['column_name']) <= B
会导致。
which results in a Truth value of a Series is ambiguous error.
要选择列值不相等的行 some_value
,请使用!=
:
To select rows whose column value does not equal some_value
, use !=
:
df.loc[df['column_name'] != some_value]
isin
返回布尔序列,因此选择 some_values
中的值为 not 的行,使用〜
取反布尔系列:
isin
returns a boolean Series, so to select rows whose value is not in some_values
, negate the boolean Series using ~
:
df.loc[~df['column_name'].isin(some_values)]
例如,
For example,
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
'B': 'one one two three two two one three'.split(),
'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
# A B C D
# 0 foo one 0 0
# 1 bar one 1 2
# 2 foo two 2 4
# 3 bar three 3 6
# 4 foo two 4 8
# 5 bar two 5 10
# 6 foo one 6 12
# 7 foo three 7 14
print(df.loc[df['A'] == 'foo'])
收益率
A B C D
0 foo one 0 0
2 foo two 2 4
4 foo two 4 8
6 foo one 6 12
7 foo three 7 14
如果您有多个值包括在内,将它们放入
列表(或更普遍地说,是任何可迭代的列表),并使用 isin
:
print(df.loc[df['B'].isin(['one','three'])])
收益率
A B C D
0 foo one 0 0
1 bar one 1 2
3 bar three 3 6
6 foo one 6 12
7 foo three 7 14
但是请注意,如果您愿意如此多次,首先使
建立索引,然后使用 df.loc
更为有效:
df = df.set_index(['B'])
print(df.loc['one'])
收益
A C D
B
one foo 0 0
one bar 1 2
one foo 6 12
,或包含索引中的多个值,请使用 df.index.isin
:
or, to include multiple values from the index use df.index.isin
:
df.loc[df.index.isin(['one','two'])]
收益
A C D
B
one foo 0 0
one bar 1 2
two foo 2 4
two foo 4 8
two bar 5 10
one foo 6 12
这篇关于如何根据列值从DataFrame中选择行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!