在两个Pandas数据框中查找公共行(交集) [英] Finding common rows (intersection) in two Pandas dataframes
问题描述
假设我有两个这种格式的数据帧(分别称为df1
和df2
):
Assume I have two dataframes of this format (call them df1
and df2
):
+------------------------+------------------------+--------+
| user_id | business_id | rating |
+------------------------+------------------------+--------+
| rLtl8ZkDX5vH5nAx9C3q5Q | eIxSLxzIlfExI6vgAbn2JA | 4 |
| C6IOtaaYdLIT5fWd7ZYIuA | eIxSLxzIlfExI6vgAbn2JA | 5 |
| mlBC3pN9GXlUUfQi1qBBZA | KoIRdcIfh3XWxiCeV1BDmA | 3 |
+------------------------+------------------------+--------+
我正在寻找一个在df1
和df2
中具有共同的user_id
的行的数据框. (即,如果df1
和df2
中都包含user_id
,则在输出数据帧中包含这两行)
I'm looking to get a dataframe of all the rows that have a common user_id
in df1
and df2
. (ie. if a user_id
is in both df1
and df2
, include the two rows in the output dataframe)
我可以想到很多方法来解决这个问题,但是它们都使我感到笨拙.例如,我们可以在每个数据框中找到所有唯一的user_id
,创建每个数据集的集合,找到它们的交集,使用结果集过滤两个数据帧,然后将两个过滤后的数据帧连接起来.
I can think of many ways to approach this, but they all strike me as clunky. For example, we could find all the unique user_id
s in each dataframe, create a set of each, find their intersection, filter the two dataframes with the resulting set and concatenate the two filtered dataframes.
也许这是最好的方法,但是我知道熊猫很聪明.有没有更简单的方法可以做到这一点?我看过merge
,但我认为这不是我所需要的.
Maybe that's the best approach, but I know Pandas is clever. Is there a simpler way to do this? I've looked at merge
but I don't think that's what I need.
推荐答案
我的理解是,可以在但是简单来说,使用此方法对OP的答案很简单:
But briefly, the answer to the OP with this method is simply:
s1 = pd.merge(df1, df2, how='inner', on=['user_id'])
哪一个给s1包含5列:user_id以及df1和df2的其他两列.
Which gives s1 with 5 columns: user_id and the other two columns from each of df1 and df2.
这篇关于在两个Pandas数据框中查找公共行(交集)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!