pandas 使用多个字段一起过滤行 [英] Pandas filter rows using multiple fields together
本文介绍了 pandas 使用多个字段一起过滤行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一只这样的大熊猫DataFrame
:
I have a pandas DataFrame
like this:
In [34]: people = pandas.DataFrame({'name' : ['John', 'John', 'Mike', 'Sarah', 'Julie'], 'age' : [28, 18, 18, 2, 69]})
people = people[['name', 'age']]
people
Out[34]:
name age
0 John 28
1 John 18
2 Mike 18
3 Sarah 2
4 Julie 69
我想使用以下元组对此DataFrame
进行过滤:
I want to filter this DataFrame
using the following tuples:
In [35]: filter = [('John', 28), ('Mike', 18)]
输出应如下所示:
Out[35]:
name age
0 John 28
2 Mike 18
我尝试这样做:
In [34]: mask = k.isin({'name': ['John', 'Mike'], 'age': [28, 18]}).all(axis=1)
k = k[mask]
k
但是它向我显示了两个约翰,因为它独立地过滤了每一列(两个约翰的年龄都出现在age
数组中).
However it shows me both Johns because it filters each column independently (the ages of both Johns are present in the age
array).
Out[34]:
name age
0 John 28
1 John 18
2 Mike 18
如何根据多个字段组合过滤行?
How do I filter rows based on multiple fields taken together?
推荐答案
这应该有效:
people.set_index(people.columns.tolist(), drop=False).loc[filter].reset_index(drop=True)
已清理并有解释</h3>
Cleaned up and with explanation
# set_index with the columns you want to reference in tuples
cols = ['name', 'age']
people = people.set_index(cols, drop=False)
# ^
# |
# ensure the cols stay in dataframe
# does what you
# want but now has
# index that was
# not there
# /--------------\
people.loc[filter].reset_index(drop=True)
# \---------------------/
# Gets rid of that index
这篇关于 pandas 使用多个字段一起过滤行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文