基于关节条件的大 pandas 切片行 [英] pandas slice rows based on joint condition
问题描述
考虑以下数据框-df
consider the below dataframe -df
one two three four five six seven eight
0 0.1 1.1 2.2 3.3 3.6 4.1 0.0 0.0
1 0.1 2.1 2.3 3.2 3.7 4.3 0.0 0.0
2 1.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.1 1.2 2.5 3.7 4.4 0.0 0.0 0.0
4 1.7 2.1 0.0 0.0 0.0 0.0 0.0 0.0
5 2.1 3.2 0.0 0.0 0.0 0.0 0.0 0.0
6 2.1 2.3 3.2 4.3 0.0 0.0 0.0 0.0
7 2.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0
8 0.1 1.8 0.0 0.0 0.0 0.0 0.0 0.0
9 1.6 0.0 0.0 0.0 0.0 0.0 0.0 0.0
我要选择任何列值为'3.2'的所有行,但同时所选行不应具有'0.1'或'1.2'值
i want to select all rows where any columns value is '3.2' but at the same time the selected rows should not have values '0.1' or '1.2'
我可以通过以下查询获得第一部分
I can able to get the first part with the below query
df[df.values == 3.2]
但不能将其与查询的第二部分(联合!=条件)
but cannot combine this with the second part of the query (the joint != condition)
我也遇到以下错误
DeprecationWarning:逐元素!=比较失败;将来会引发错误.
DeprecationWarning: elementwise != comparison failed; this will raise an error in the future.
尝试以下操作时,在较大的数据集(而不是较小的副本)上
on the larger data set (but not on the smaller replica) when trying the below
df[df.values != [0.1,1.2]]
//
@pensen,这是输出,第1、15、27、35行的值应为'0.1',但根据条件应将其过滤.
@pensen, here is the output, rows 1, 15, 27, 35 have values '0.1' though as per the condition they should have been filtered.
contains = df.eq(3.2).any(axis=1)
not_contains = ~df.isin([0.1,1.2]).any(axis=1)
print(df[contains & not_contains])
0 1 2 3 4 5 6 7
1 0.1 2.1 3.2 0.0 0.0 0.0 0.0 0.0
15 0.1 1.1 2.2 3.2 3.3 3.6 3.7 0.0
27 0.1 2.1 2.3 3.2 3.6 3.7 4.3 0.0
31 3.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0
35 0.1 1.7 2.1 3.2 3.6 3.7 4.3 0.0
这是从0:36行开始复制上述输出的原始数据集
here is the original dataset from 0:36 rows to replicate the above output
0 1 2 3 4 5 6 7
0 4.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.1 2.1 3.2 0.0 0.0 0.0 0.0 0.0
2 0.1 2.4 2.5 0.0 0.0 0.0 0.0 0.0
3 2.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 4.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
6 0.1 2.1 4.1 0.0 0.0 0.0 0.0 0.0
7 4.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0
8 1.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0
9 2.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0
10 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
11 1.1 4.1 0.0 0.0 0.0 0.0 0.0 0.0
12 0.1 2.2 3.3 3.6 0.0 0.0 0.0 0.0
13 0.1 1.8 3.3 0.0 0.0 0.0 0.0 0.0
14 0.1 1.2 1.3 2.5 3.7 4.2 0.0 0.0
15 0.1 1.1 2.2 3.2 3.3 3.6 3.7 0.0
16 1.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0
17 1.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0
18 1.3 2.5 0.0 0.0 0.0 0.0 0.0 0.0
19 0.1 1.2 2.5 3.7 4.4 0.0 0.0 0.0
20 1.2 4.4 0.0 0.0 0.0 0.0 0.0 0.0
21 4.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0
22 1.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0
23 0.1 2.2 2.4 2.5 3.7 0.0 0.0 0.0
24 0.1 2.4 4.3 0.0 0.0 0.0 0.0 0.0
25 1.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0
26 0.1 1.1 4.1 0.0 0.0 0.0 0.0 0.0
27 0.1 2.1 2.3 3.2 3.6 3.7 4.3 0.0
28 1.4 2.2 3.6 4.1 0.0 0.0 0.0 0.0
29 1.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0
30 1.2 4.4 0.0 0.0 0.0 0.0 0.0 0.0
31 3.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0
32 3.6 4.1 0.0 0.0 0.0 0.0 0.0 0.0
33 2.1 2.4 0.0 0.0 0.0 0.0 0.0 0.0
34 0.1 1.8 0.0 0.0 0.0 0.0 0.0 0.0
35 0.1 1.7 2.1 3.2 3.6 3.7 4.3 0.0
这是链接到实际数据集
推荐答案
为了提高性能,特别是因为您提到了大型数据集,并且如果您希望仅排除两个数字,则这是使用数组数据的一种方法-
For performance, specially since you mentioned large dataset and if you are looking to exclude just two numbers, here's one approach with array data -
a = df.values
df_out = df.iloc[(a == 3.2).any(1) & (((a!=0.1) & (a!=1.2)).all(1))]
样品运行-
In [43]: a = df.values
In [44]: df.iloc[(a == 3.2).any(1) & (((a!=0.1) & (a!=1.2)).all(1))]
Out[44]:
one two three four five six seven eight
5 2.1 3.2 0.0 0.0 0 0 0 0
6 2.1 2.3 3.2 4.3 0 0 0 0
这篇关于基于关节条件的大 pandas 切片行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!