在python中的pandas数据框上屏蔽多列 [英] Masking multiple columns on a pandas dataframe in python

查看:135
本文介绍了在python中的pandas数据框上屏蔽多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望在python中的熊猫数据集的每个列(分别为其属性)上应用乘以蒙版. 在下一步中,我要在适合所有条件的数据框中查找行. 因此我有:

i am looking to apply multiply masks on each column of a pandas dataset (respectively to it's properties) in python. In the next step i want to find (a) row(s) in the dataframe that fits all conditions. therefore i have:

df
Out[27]: 
   DE  FL  GA  IA  ID 
0   0   1   0   0   0 
1   1   0   1   0   1  
2   0   0   1   0   0 
3   0   1   0   0   0
4   0   0   0   0   0 

mask_list = []
for i in range(0,5):

    if i % 2==0:
        mask_list.append(df[[i]]>0)
    else:
        mask_list.append(df[[i]]<1)

concat_frame = pa.DataFrame()
for mask in mask_list:
    concat_frame =pa.concat((concat_frame, mask), axis=1)

concat_frame
Out[48]: 
      DE     FL     GA    IA     ID
0  False   False False  True  False
1  True    True  True   True  True
2  False   True  True   True  False
3  False   False False  True  False
4  False   True  False  True  False

[5 rows x 5 columns]


更新 预期结果:


update expected outcome:

outcome
Out[60]:
   DE   FL  GA  IA  ID
1   1   0   1   0   1 

出现问题:
我如何在 df 上应用 concat_mask ,以便我选择行,其中所有布尔标准都匹配 (是真的)?

Here comes the question :
how can i apply the concat_mask on df , so that i select rows, in which all Boolean criteria are matched (are True)?

推荐答案

您可以使用pandas all方法和布尔逻辑.正如EdChum所评论的,我仍然不清楚您的确切示例,但是类似的示例是

You can use the pandas all method and boolean logic. As EdChum commented I am a bit unclear still on your exact example but a similar example is

In [1]: df = DataFrame([[1,2],[-3,5]], index=[0,1], columns=['a','b'])
In [2]: df
Out [2]:
   a  b
0  1  2
1 -3  5

In [3]: msk = (df>1) & (df<5)
In [4]: msk
Out [4]:
      a    b
0 False  True
1 False False

In [5]: msk.all(axis=1)
Out [5]:
0  False
1  False
dtype: bool

如果您想通过掩码为原始数据帧编制索引

If you wanted to index the original dataframe by the mask you could do

In [6]: df[msk]
Out [6]:
     a   b
0  NaN   2
1  NaN NaN

或者如您最初指示的那样,所有行都为真的行

Or as you originally indicated rows where all the rows are true

In [7]: idx = msk.all(axis=1)
In [8]: df[idx]
Out [8]:
Empty DataFrame
Columns: [a,b]
Index: []

或者如果一行为真

In [9]: idx[0] = True
In [10]: df[idx]
Out [10]:
  a b
0 1 2

只是为了从注释中澄清后解决原始问题,我们希望针对不同的列使用不同的过滤条件

Just to address the original question after clarification from the comments, where we want different filtering criteria for different columns

In [10]: msk1 = df[['a']] < 0
In [11]: msk2 = df[['b']] > 3
In [12]: msk = concat((msk1, msk2), axis=1)
In [12]: slct = msk.all(axis=1)
In [13]: df.ix[slct]
Out [13]:
   a b
1 -3 5

这篇关于在python中的pandas数据框上屏蔽多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆