pandas :按多列中的值过滤 [英] Pandas: Filter by values within multiple columns

查看:53
本文介绍了 pandas :按多列中的值过滤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图基于一个条件基于多个列中的值来过滤数据帧,但保留其他我根本不想对其应用过滤器的列.

I'm trying to filter a dataframe based on the values within the multiple columns, based on a single condition, but keep other columns to which I don't want to apply the filter at all.

我已经查看了这些答案,第三个是最接近的答案,但仍然没有运气:

I've reviewed these answers, with the third being the closest, but still no luck:

设置:

import pandas as pd

df = pd.DataFrame({
        'month':[1,1,1,2,2],
        'a':['A','A','A','A','NONE'],
        'b':['B','B','B','B','B'],
        'c':['C','C','C','NONE','NONE']
    }, columns = ['month','a','b','c'])

l = ['month','a','c']
df = df.loc[df['month'] == df['month'].max(), df.columns.isin(l)].reset_index(drop = True)

当前输出:

   month     a     c
0      2     A  NONE
1      2  NONE  NONE

所需的输出:

   month     a
0      2     A
1      2  NONE

我尝试过:

sub = l[1:]
df = df[(df.loc[:, sub] != 'NONE').any(axis = 1)]

和许多其他变体( .all() [sub,:] 〜df.loc [...] (axis = 0)),但都没有运气.

and many other variations (.all(), [sub, :], ~df.loc[...], (axis = 0)), but all with no luck.

基本上,我想删除其中具有所有"NONE"值的任何列(在 sub 列表内).

Basically I want to drop any column (within the sub list) that has all 'NONE' values in it.

非常感谢您的帮助.

推荐答案

您首先要用 np.nan 替换'NONE',以便将其识别为 dropna 设置为空值.然后将 loc 与您的布尔系列和列子集一起使用.然后将 dropna axis = 1 how ='all'

You first want to substitute your 'NONE' with np.nan so that it is recognized as a null value by dropna. Then use loc with your boolean series and column subset. Then use dropna with axis=1 and how='all'

df.replace('NONE', np.nan) \
    .loc[df.month == df.month.max(), l].dropna(axis=1, how='all')

   month     a
3      2     A
4      2  NONE

这篇关于 pandas :按多列中的值过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆