在DataFrame中按多列标准删除行 [英] Drop Rows by Multiple Column Criteria in DataFrame

查看:3804
本文介绍了在DataFrame中按多列标准删除行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框,我试图根据一个选择列的标准删除行。如果这些选择列中的值为零,则应该删除这些行。这是一个例子。

 将pandas导入pd 
t = pd.DataFrame({'a':[ 0,0,2],'b':[1,2,0,0],'c':[1,2,3,4]})

abc
0 1 1 1
1 0 2 2
2 0 0 3
3 2 0 4

我想尝试一下:

  cols_of_interest = ['a','b'] #Drop行在所有这些列中为零
t = t [t [cols_of_interest]!= 0]

这不会删除行,所以我试过:

  t = t.drop(t [cols_of_interest] = = 0] .index)

所有行都被删除。



我想要得到的是:

  abc 
0 1 1 1
1 0 2 2
3 2 0 4

其中第3行2)被删除,因为它在BOTH中的值为0,而不仅仅是一个。

解决方案

你冷杉st分配了您的布尔条件的结果: t = t [t [cols_of_interest]!= 0] 覆盖您的原始df并设置条件不符合 NaN 值。



你想做的是生成布尔蒙版,然后放下 NaN 行,并通过 thresh = 1 ,以便至少必须至少有一个非 NaN 该行的值,我们可以使用 loc 并使用此索引获取所需的df:

 在[124]中:

cols_of_interest = ['a','b']
t.loc [t [t [cols_of_interest]!= 0] .dropna(thresh = 1).index]
输出[124]:
abc
0 1 1 1
1 0 2 2
3 2 0 4

编辑



任何并通过 axis = 1 来测试条件并使用它来索引到您的df:

 在[125]中:

t [(t [cols_of_interest]!= 0).any(axis = 1)]
输出[125]:
abc
0 1 1 1
1 0 2 2
3 2 0 4


I have a pandas dataframe that I'm trying to drop rows based on a criteria across select columns. If the values in these select columns are zero, the rows should be dropped. Here is an example.

import pandas as pd
t = pd.DataFrame({'a':[1,0,0,2],'b':[1,2,0,0],'c':[1,2,3,4]})

  a b c
0 1 1 1 
1 0 2 2 
2 0 0 3 
3 2 0 4

I would like to try something like:

cols_of_interest = ['a','b'] #Drop rows if zero in all these columns
t = t[t[cols_of_interest]!=0]

This doesn't drop the rows, so I tried:

t = t.drop(t[t[cols_of_interest]==0].index)

And all rows are dropped.

What I would like to end up with is:

  a b c
0 1 1 1 
1 0 2 2 
3 2 0 4

Where the 3rd row (index 2) was dropped because it took on value 0 in BOTH the columns of interest, not just one.

解决方案

Your problem here is that you first assigned the result of your boolean condition: t = t[t[cols_of_interest]!=0] which overwrites your original df and sets where the condition is not met with NaN values.

What you want to do is generate the boolean mask, then drop the NaN rows and pass thresh=1 so that there must be at least a single non-NaN value in that row, we can then use loc and use the index of this to get the desired df:

In [124]:

cols_of_interest = ['a','b']
t.loc[t[t[cols_of_interest]!=0].dropna(thresh=1).index]
Out[124]:
   a  b  c
0  1  1  1
1  0  2  2
3  2  0  4

EDIT

As pointed out by @DSM you can achieve this simply by using any and passing axis=1 to test the condition and use this to index into your df:

In [125]:

t[(t[cols_of_interest] != 0).any(axis=1)]
Out[125]:
   a  b  c
0  1  1  1
1  0  2  2
3  2  0  4

这篇关于在DataFrame中按多列标准删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆