pandas :按多列中的值过滤 [英] Pandas: Filter by values within multiple columns
问题描述
我试图基于一个条件基于多个列中的值来过滤数据帧,但保留其他我根本不想对其应用过滤器的列.
I'm trying to filter a dataframe based on the values within the multiple columns, based on a single condition, but keep other columns to which I don't want to apply the filter at all.
我已经查看了这些答案,第三个是最接近的答案,但仍然没有运气:
I've reviewed these answers, with the third being the closest, but still no luck:
设置:
import pandas as pd
df = pd.DataFrame({
'month':[1,1,1,2,2],
'a':['A','A','A','A','NONE'],
'b':['B','B','B','B','B'],
'c':['C','C','C','NONE','NONE']
}, columns = ['month','a','b','c'])
l = ['month','a','c']
df = df.loc[df['month'] == df['month'].max(), df.columns.isin(l)].reset_index(drop = True)
当前输出:
month a c
0 2 A NONE
1 2 NONE NONE
所需的输出:
month a
0 2 A
1 2 NONE
我尝试过:
sub = l[1:]
df = df[(df.loc[:, sub] != 'NONE').any(axis = 1)]
和许多其他变体( .all()
, [sub,:]
,〜df.loc [...]
,(axis = 0)
),但都没有运气.
and many other variations (.all()
, [sub, :]
, ~df.loc[...]
, (axis = 0)
), but all with no luck.
基本上,我想删除其中具有所有"NONE"值的任何列(在 sub
列表内).
Basically I want to drop any column (within the sub
list) that has all 'NONE' values in it.
非常感谢您的帮助.
推荐答案
您首先要用 np.nan
替换'NONE'
,以便将其识别为 dropna
设置为空值.然后将 loc
与您的布尔系列和列子集一起使用.然后将 dropna
与 axis = 1
和 how ='all'
You first want to substitute your 'NONE'
with np.nan
so that it is recognized as a null value by dropna
. Then use loc
with your boolean series and column subset. Then use dropna
with axis=1
and how='all'
df.replace('NONE', np.nan) \
.loc[df.month == df.month.max(), l].dropna(axis=1, how='all')
month a
3 2 A
4 2 NONE
这篇关于 pandas :按多列中的值过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!