过滤 pandas 中的数据框 [英] filtering a data frame in pandas

查看:45
本文介绍了过滤 pandas 中的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,如下所示:

I have a data frame as follows:

dic ={'wteam': [2, 3, 4, 2, 4], 'lteam': [3, 4, 2, 4, 2]}
pd.DataFrame(dic) 

   lteam  wteam
0    3      2
1    4      3
2    2      4
3    4      2
4    3      4

我需要一个新的数据框,该数据框包含2个lteam或wteam.

I need a new data frame which has 2 in lteam or wteam.

        lteam  wteam
    0    3      2
    2    2      4
    3    4      2

如何在熊猫中做到这一点?

How do I do this in pandas?

推荐答案

您的起始df输出错误,最后一行应为[2,4],除此之外,我们可以在生成的索引上调用loc通过布尔过滤的df加上任何NaN值:

Your output from your starting df is wrong, the last row should be [2,4], aside from that we can call loc on the index generated by a boolean filtered df plus drop any NaN values:

In [15]:

df.loc[df[df==2].dropna(thresh=1).index]
Out[15]:
   lteam  wteam
0      3      2
2      2      4
3      4      2
4      2      4

打破这一点:

In [16]:

df[df==2]
Out[16]:
   lteam  wteam
0    NaN      2
1    NaN    NaN
2      2    NaN
3    NaN      2
4      2    NaN
In [17]:

df[df==2].dropna(thresh=1)
Out[17]:
   lteam  wteam
0    NaN      2
2      2    NaN
3    NaN      2
4      2    NaN

更简洁的方法是提供2个布尔条件:

A more succinct method would be to supply 2 boolean conditions:

In [18]:

df[(df.lteam == 2) | (df.wteam == 2)]
Out[18]:
   lteam  wteam
0      3      2
2      2      4
3      4      2
4      2      4

这需要使用按位|运算符,并由于运算符的优先级而在条件周围使用括号

This requires using the bitwise | operator and brackets around the conditions due to operator precedence

如果您有很多列,则第一种方法会更好,但对于简单的数据集,则第二种方法会更好.

The first method would be better if you have lots of columns but for your simple dataset then the latter method would be fine.

这篇关于过滤 pandas 中的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆