复杂数据框合并Python pandas [英] Complex Dataframe Merge Python Pandas

查看:55
本文介绍了复杂数据框合并Python pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试合并2个数据帧,但无法完全获得所需的内容.

I am trying to merge 2 dataframes and can't quite get what I'm looking for.

数据框1看起来像这样.

Dataframe 1 looks like this.

Index       Date      Data1   Data2

  A    2007-07-21      76      32
  A    2007-08-13      nan     23
  B    2007-06-15      53      nan
  B    2007-07-15      87      39

数据框2如下所示:

Index       Date      Data3   Data4

  A    2007-07-24      14      nan
  A    2007-08-13      67      51
  B    2007-06-21      32      36
  B    2007-07-15      nan     91

两个数据框中的索引相同.索引标签包含重复项.日期中有一些重叠,但每个数据框还包含唯一的日期.

The same indicies are in both dataframes. The index labels contain duplicates. There is some overlap in the dates but each dataframe also contains unique dates.

我想要的结果如下:具有相同索引和日期的行以组合值(Data1,Data2,Data3,Data4)的形式一次出现在结果中.如果索引/日期"组合在左侧数据框或右侧数据框中出现一次,则该组合会与来自相应数据框的相关数据一起出现,并在数据框中的不存在值的列中出现nans.

What I'd like in my result is the following: rows with the same Index and Date appear ONCE in the result with combined values (Data1, Data2, Data3, Data4). If an Index/Date combination appears once in either the left dataframe or right dataframe, that combination appears along with the relevant data from respective data frame and nans in columns from the dataframe where values don't exist.

从上述数据框中,结果看起来像这样:

From the above dataframes the result would look like this:

Index       Date      Data1   Data2  Data3  Data4

  A    2007-07-21      76      32     nan    nan
  A    2007-07-24      nan     nan    14     nan  
  A    2007-08-13      nan     23     67      51
  B    2007-06-15      53      nan    nan    nan
  B    2007-06-21      nan     nan    32      36
  B    2007-07-15      87      39     nan     91

此练习具有左连接和外连接的方面.不知道如何使用pd.merge或pd.concat来获得它.

This exercise has aspects of a left join but also an outer join. Not sure how to get this using pd.merge or pd.concat.

提前感谢您的洞察力.

推荐答案

set_index + concat

pd.concat([df1.set_index(['Index','Date']),df2.set_index(['Index','Date'])],1).reset_index()
Out[1145]: 
  Index        Date  Data1  Data2  Data3  Data4
0     A  2007-07-21   76.0   32.0    NaN    NaN
1     A  2007-07-24    NaN    NaN   14.0    NaN
2     A  2007-08-13    NaN   23.0   67.0   51.0
3     B  2007-06-15   53.0    NaN    NaN    NaN
4     B  2007-06-21    NaN    NaN   32.0   36.0
5     B  2007-07-15   87.0   39.0    NaN   91.0

或者我们可以使用merge

df1.merge(df2,on=['Index','Date'],how='outer')
Out[1147]: 
  Index        Date  Data1  Data2  Data3  Data4
0     A  2007-07-21   76.0   32.0    NaN    NaN
1     A  2007-08-13    NaN   23.0   67.0   51.0
2     B  2007-06-15   53.0    NaN    NaN    NaN
3     B  2007-07-15   87.0   39.0    NaN   91.0
4     A  2007-07-24    NaN    NaN   14.0    NaN
5     B  2007-06-21    NaN    NaN   32.0   36.0

这篇关于复杂数据框合并Python pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆