根据列值连接 pandas 数据框 [英] Join pandas dataframes based on column values

查看:55
本文介绍了根据列值连接 pandas 数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对pandas数据框还很陌生,在连接两个表时遇到了一些麻烦.

I'm quite new to pandas dataframes, and I'm experiencing some troubles joining two tables.

第一个df只有3列:

DF1:
item_id    position    document_id
336        1           10
337        2           10
338        3           10
1001       1           11
1002       2           11
1003       3           11
38         10          146

第二个具有完全相同的两列(以及许多其他列):

And the second has exactly same two columns (and plenty of others):

DF2
item_id    document_id    col1    col2   col3    ...
337        10             ...     ...    ...
1002       11             ...     ...    ...
1003       11             ...     ...    ...

我需要执行的操作在SQL中如下所示:

What I need is to perform an operation which, in SQL, would look as follows:

DF1 join DF2 on 
DF1.document_id = DF2.document_id
and
DF1.item_id = DF2.item_id

因此,我希望看到DF2,并补充了位置"列:

And, as a result, I want to see DF2, complemented with column 'position':

item_id    document_id    position    col1   col2   col3   ...

使用熊猫做这件事的好方法是什么?

What is a good way to do this using pandas?

谢谢!

推荐答案

我认为您需要 merge 和默认的inner联接,但没有必要在两列中重复复制值:

I think you need merge with default inner join, but is necessary no duplicated combinations of values in both columns:

print (df2)
   item_id  document_id col1  col2  col3
0      337           10    s     4     7
1     1002           11    d     5     8
2     1003           11    f     7     0

df = pd.merge(df1, df2, on=['document_id','item_id'])
print (df)
   item_id  position  document_id col1  col2  col3
0      337         2           10    s     4     7
1     1002         2           11    d     5     8
2     1003         3           11    f     7     0

但如有必要,请在位置3position列:

But if necessary position column in position 3:

df = pd.merge(df2, df1, on=['document_id','item_id'])
cols = df.columns.tolist()
df = df[cols[:2] + cols[-1:] + cols[2:-1]]
print (df)
   item_id  document_id  position col1  col2  col3
0      337           10         2    s     4     7
1     1002           11         2    d     5     8
2     1003           11         3    f     7     0

这篇关于根据列值连接 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆