过滤 pandas 中的数据框 [英] filtering a data frame in pandas
问题描述
我有一个数据框,如下所示:
I have a data frame as follows:
dic ={'wteam': [2, 3, 4, 2, 4], 'lteam': [3, 4, 2, 4, 2]}
pd.DataFrame(dic)
lteam wteam
0 3 2
1 4 3
2 2 4
3 4 2
4 3 4
我需要一个新的数据框,该数据框包含2个lteam或wteam.
I need a new data frame which has 2 in lteam or wteam.
lteam wteam
0 3 2
2 2 4
3 4 2
如何在熊猫中做到这一点?
How do I do this in pandas?
推荐答案
您的起始df输出错误,最后一行应为[2,4],除此之外,我们可以在生成的索引上调用loc
通过布尔过滤的df加上任何NaN
值:
Your output from your starting df is wrong, the last row should be [2,4], aside from that we can call loc
on the index generated by a boolean filtered df plus drop any NaN
values:
In [15]:
df.loc[df[df==2].dropna(thresh=1).index]
Out[15]:
lteam wteam
0 3 2
2 2 4
3 4 2
4 2 4
打破这一点:
In [16]:
df[df==2]
Out[16]:
lteam wteam
0 NaN 2
1 NaN NaN
2 2 NaN
3 NaN 2
4 2 NaN
In [17]:
df[df==2].dropna(thresh=1)
Out[17]:
lteam wteam
0 NaN 2
2 2 NaN
3 NaN 2
4 2 NaN
更简洁的方法是提供2个布尔条件:
A more succinct method would be to supply 2 boolean conditions:
In [18]:
df[(df.lteam == 2) | (df.wteam == 2)]
Out[18]:
lteam wteam
0 3 2
2 2 4
3 4 2
4 2 4
这需要使用按位|
运算符,并由于运算符的优先级而在条件周围使用括号
This requires using the bitwise |
operator and brackets around the conditions due to operator precedence
如果您有很多列,则第一种方法会更好,但对于简单的数据集,则第二种方法会更好.
The first method would be better if you have lots of columns but for your simple dataset then the latter method would be fine.
这篇关于过滤 pandas 中的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!