pandas 左外连接多列上的多个数据框 [英] Pandas left outer join multiple dataframes on multiple columns

查看:48
本文介绍了 pandas 左外连接多列上的多个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是使用 DataFrame 的新手,我想知道如何在一系列表的多列上执行等效于左外连接的 SQL

I am new to using DataFrame and I would like to know how to perform a SQL equivalent of left outer join on multiple columns on a series of tables

示例:

df1: 
Year    Week    Colour    Val1 
2014       A       Red      50
2014       B       Red      60
2014       B     Black      70
2014       C       Red      10
2014       D     Green      20

df2:
Year    Week    Colour    Val2
2014       A     Black      30
2014       B     Black     100
2014       C     Green      50
2014       C       Red      20
2014       D       Red      40

df3:
Year    Week    Colour    Val3
2013       B       Red      60
2013       C     Black      80
2013       B     Black      10
2013       D     Green      20
2013       D       Red      50

本质上我想做这样的 SQL 代码(注意 df3 没有加入 Year):

Essentially I want to do something like this SQL code (Notice that df3 is not joined on Year):

SELECT df1.*, df2.Val2, df3.Val3
FROM df1
  LEFT OUTER JOIN df2
    ON df1.Year = df2.Year
    AND df1.Week = df2.Week
    AND df1.Colour = df2.Colour
  LEFT OUTER JOIN df3
    ON df1.Week = df3.Week
    AND df1.Colour = df3.Colour

结果应该是这样的:

Year    Week    Colour    Val1    Val2    Val3
2014       A       Red      50    Null    Null
2014       B       Red      60    Null      60
2014       B     Black      70     100    Null
2014       C       Red      10      20    Null
2014       D     Green      20    Null    Null

我曾尝试使用合并和连接,但无法弄清楚如何在多个表上以及何时涉及多个关节时执行此操作.有人可以帮我吗?

I have tried using merge and join but can't figure out how to do it on multiple tables and when there are multiple joints involved. Could someone help me on this please?

谢谢

推荐答案

分两步合并,先df1df2,然后将结果<代码>df3.

Merge them in two steps, df1 and df2 first, and then the result of that to df3.

In [33]: s1 = pd.merge(df1, df2, how='left', on=['Year', 'Week', 'Colour'])

我从 df3 中删除了 year,因为您在最后一次加入时不需要它.

I dropped year from df3 since you don't need it for the last join.

In [39]: df = pd.merge(s1, df3[['Week', 'Colour', 'Val3']],
                       how='left', on=['Week', 'Colour'])

In [40]: df
Out[40]: 
   Year Week Colour  Val1  Val2 Val3
0  2014    A    Red    50   NaN  NaN
1  2014    B    Red    60   NaN   60
2  2014    B  Black    70   100   10
3  2014    C    Red    10    20  NaN
4  2014    D  Green    20   NaN   20

[5 rows x 6 columns]

这篇关于 pandas 左外连接多列上的多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆