在两个Pandas数据框中查找公共行(交集) [英] Finding common rows (intersection) in two Pandas dataframes

查看:557
本文介绍了在两个Pandas数据框中查找公共行(交集)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有两个这种格式的数据帧(分别称为df1df2):

Assume I have two dataframes of this format (call them df1 and df2):

+------------------------+------------------------+--------+
|        user_id         |      business_id       | rating |
+------------------------+------------------------+--------+
| rLtl8ZkDX5vH5nAx9C3q5Q | eIxSLxzIlfExI6vgAbn2JA |      4 |
| C6IOtaaYdLIT5fWd7ZYIuA | eIxSLxzIlfExI6vgAbn2JA |      5 |
| mlBC3pN9GXlUUfQi1qBBZA | KoIRdcIfh3XWxiCeV1BDmA |      3 |
+------------------------+------------------------+--------+

我正在寻找一个在df1df2中具有共同的user_id的行的数据框. (即,如果df1df2中都包含user_id,则在输出数据帧中包含这两行)

I'm looking to get a dataframe of all the rows that have a common user_id in df1 and df2. (ie. if a user_id is in both df1 and df2, include the two rows in the output dataframe)

我可以想到很多方法来解决这个问题,但是它们都使我感到笨拙.例如,我们可以在每个数据框中找到所有唯一的user_id,创建每个数据集的集合,找到它们的交集,使用结果集过滤两个数据帧,然后将两个过滤后的数据帧连接起来.

I can think of many ways to approach this, but they all strike me as clunky. For example, we could find all the unique user_ids in each dataframe, create a set of each, find their intersection, filter the two dataframes with the resulting set and concatenate the two filtered dataframes.

也许这是最好的方法,但是我知道熊猫很聪明.有没有更简单的方法可以做到这一点?我看过merge,但我认为这不是我所需要的.

Maybe that's the best approach, but I know Pandas is clever. Is there a simpler way to do this? I've looked at merge but I don't think that's what I need.

推荐答案

我的理解是,可以在但是简单来说,使用此方法对OP的答案很简单:

But briefly, the answer to the OP with this method is simply:

s1 = pd.merge(df1, df2, how='inner', on=['user_id'])

哪一个给s1包含5列:user_id以及df1和df2的其他两列.

Which gives s1 with 5 columns: user_id and the other two columns from each of df1 and df2.

这篇关于在两个Pandas数据框中查找公共行(交集)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆