在DataFrame中按多列标准删除行 [英] Drop Rows by Multiple Column Criteria in DataFrame
问题描述
将pandas导入pd
t = pd.DataFrame({'a':[ 0,0,2],'b':[1,2,0,0],'c':[1,2,3,4]})
abc
0 1 1 1
1 0 2 2
2 0 0 3
3 2 0 4
我想尝试一下:
cols_of_interest = ['a','b'] #Drop行在所有这些列中为零
t = t [t [cols_of_interest]!= 0]
这不会删除行,所以我试过:
t = t.drop(t [cols_of_interest] = = 0] .index)
所有行都被删除。
我想要得到的是:
abc
0 1 1 1
1 0 2 2
3 2 0 4
其中第3行2)被删除,因为它在BOTH中的值为0,而不仅仅是一个。
你冷杉st分配了您的布尔条件的结果: t = t [t [cols_of_interest]!= 0]
覆盖您的原始df并设置条件不符合 NaN
值。
你想做的是生成布尔蒙版,然后放下 NaN
行,并通过 thresh = 1
,以便至少必须至少有一个非 NaN
该行的值,我们可以使用 loc
并使用此索引获取所需的df:
在[124]中:
cols_of_interest = ['a','b']
t.loc [t [t [cols_of_interest]!= 0] .dropna(thresh = 1).index]
输出[124]:
abc
0 1 1 1
1 0 2 2
3 2 0 4
编辑
任何并通过 axis = 1
来测试条件并使用它来索引到您的df:
在[125]中:
t [(t [cols_of_interest]!= 0).any(axis = 1)]
输出[125]:
abc
0 1 1 1
1 0 2 2
3 2 0 4
I have a pandas dataframe that I'm trying to drop rows based on a criteria across select columns. If the values in these select columns are zero, the rows should be dropped. Here is an example.
import pandas as pd
t = pd.DataFrame({'a':[1,0,0,2],'b':[1,2,0,0],'c':[1,2,3,4]})
a b c
0 1 1 1
1 0 2 2
2 0 0 3
3 2 0 4
I would like to try something like:
cols_of_interest = ['a','b'] #Drop rows if zero in all these columns
t = t[t[cols_of_interest]!=0]
This doesn't drop the rows, so I tried:
t = t.drop(t[t[cols_of_interest]==0].index)
And all rows are dropped.
What I would like to end up with is:
a b c
0 1 1 1
1 0 2 2
3 2 0 4
Where the 3rd row (index 2) was dropped because it took on value 0 in BOTH the columns of interest, not just one.
Your problem here is that you first assigned the result of your boolean condition: t = t[t[cols_of_interest]!=0]
which overwrites your original df and sets where the condition is not met with NaN
values.
What you want to do is generate the boolean mask, then drop the NaN
rows and pass thresh=1
so that there must be at least a single non-NaN
value in that row, we can then use loc
and use the index of this to get the desired df:
In [124]:
cols_of_interest = ['a','b']
t.loc[t[t[cols_of_interest]!=0].dropna(thresh=1).index]
Out[124]:
a b c
0 1 1 1
1 0 2 2
3 2 0 4
EDIT
As pointed out by @DSM you can achieve this simply by using any
and passing axis=1
to test the condition and use this to index into your df:
In [125]:
t[(t[cols_of_interest] != 0).any(axis=1)]
Out[125]:
a b c
0 1 1 1
1 0 2 2
3 2 0 4
这篇关于在DataFrame中按多列标准删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!