pandas 使用多个字段一起过滤行 [英] Pandas filter rows using multiple fields together

查看:97
本文介绍了 pandas 使用多个字段一起过滤行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一只这样的大熊猫DataFrame:

I have a pandas DataFrame like this:

In [34]: people = pandas.DataFrame({'name' : ['John', 'John', 'Mike', 'Sarah', 'Julie'], 'age' : [28, 18, 18, 2, 69]})
         people  = people[['name', 'age']]
         people

Out[34]:    
    name    age
0   John    28
1   John    18
2   Mike    18
3   Sarah   2
4   Julie   69

我想使用以下元组对此DataFrame进行过滤:

I want to filter this DataFrame using the following tuples:

In [35]: filter = [('John', 28), ('Mike', 18)]

输出应如下所示:

Out[35]: 
    name    age
0   John    28
2   Mike    18

我尝试这样做:

In [34]: mask = k.isin({'name': ['John', 'Mike'], 'age': [28, 18]}).all(axis=1)
         k = k[mask]
         k

但是它向我显示了两个约翰,因为它独立地过滤了每一列(两个约翰的年龄都出现在age数组中).

However it shows me both Johns because it filters each column independently (the ages of both Johns are present in the age array).

Out[34]: 
    name    age
0   John    28
1   John    18
2   Mike    18

如何根据多个字段组合过滤行?

How do I filter rows based on multiple fields taken together?

推荐答案

这应该有效:

people.set_index(people.columns.tolist(), drop=False).loc[filter].reset_index(drop=True)

已清理并有解释<​​/h3>

Cleaned up and with explanation

# set_index with the columns you want to reference in tuples
cols = ['name', 'age']
people = people.set_index(cols, drop=False)
#                                   ^
#                                   |
#   ensure the cols stay in dataframe

#   does what you
#   want but now has
#   index that was
#   not there
# /--------------\
people.loc[filter].reset_index(drop=True)
#                 \---------------------/
#                  Gets rid of that index

这篇关于 pandas 使用多个字段一起过滤行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆