pandas 将两个数据框与不同的列合并 [英] Pandas merge two dataframes with different columns

查看:1539
本文介绍了 pandas 将两个数据框与不同的列合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我确实在这里遗漏了一些简单的东西。尝试将大多数相同列名的大熊猫中的两个数据帧合并,但正确的数据框具有左侧没有的列,反之亦然。

 > df_may 

id数量attr_1 attr_2
0 1 20 0 1
1 2 23 1 1
2 3 19 1 1
3 4 19 0 0

> df_jun

id数量attr_1 attr_3
0 5 8 1 0
1 6 13 0 1
2 7 20 1 1
3 8 25 1 1

我尝试加入一个外连接:

  mayjundf = pd .DataFrame.merge(df_may,df_jun,how =outer)

但是,左数据列不唯一:索引([....


  

我还指定了一个列加入(on =id,例如),但是除了id以外的所有列都像attr_1_x,attr_1_y,不太理想,我也将整列列(有很多)传递给on:

  mayjundf = pd.DataFrame.merge(df_may,df_jun,how =outer,on = list(df_may.columns.values))

其中产生:

  ValueError:缓冲区的维数错误(预计1 ,得到2)

我缺少什么?我想得到一个附加了所有行的df,而attr_1,attr_2,attr_3填充在可能的地方,NaN在那里不显示。这似乎是一个非常典型的数据显示工作流程,但我被困住了。



提前感谢

解决方案

我认为在这种情况下 concat 是你想要的:

 在[12]中:

pd.concat([df,df1],axis = 0,ignore_index = True)
输出[12]:
attr_1 attr_2 attr_3 id数量
0 0 1 NaN 1 20
1 1 1 NaN 2 23
2 1 1 NaN 3 19
3 0 0 NaN 4 19
4 1 NaN 0 5 8
5 0 NaN 1 6 13
6 1 NaN 1 7 20
7 1 NaN 1 8 25

通过传递 axis = 0 这里你正在堆叠df的顶部,我相信是什么y ou然后生成 NaN 值,他们不在他们各自的dfs。


I'm surely missing something simple here. Trying to merge two dataframes in pandas that have mostly the same column names, but the right dataframe has some columns that the left doesn't have, and vice versa.

>df_may

  id  quantity  attr_1  attr_2
0  1        20       0       1
1  2        23       1       1
2  3        19       1       1
3  4        19       0       0

>df_jun

  id  quantity  attr_1  attr_3
0  5         8       1       0
1  6        13       0       1
2  7        20       1       1
3  8        25       1       1

I've tried joining with an outer join:

mayjundf = pd.DataFrame.merge(df_may, df_jun, how="outer")

But that yields:

Left data columns not unique: Index([....

I've also specified a single column to join on (on = "id", e.g.), but that duplicates all columns except "id" like attr_1_x, attr_1_y, which is not ideal. I've also passed the entire list of columns (there are many) to "on":

mayjundf = pd.DataFrame.merge(df_may, df_jun, how="outer", on=list(df_may.columns.values))

Which yields:

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

What am I missing? I'd like to get a df with all rows appended, and attr_1, attr_2, attr_3 populated where possible, NaN where they don't show up. This seems like a pretty typical workflow for data munging, but I'm stuck.

Thanks in advance.

解决方案

I think in this case concat is what you want:

In [12]:

pd.concat([df,df1], axis=0, ignore_index=True)
Out[12]:
   attr_1  attr_2  attr_3  id  quantity
0       0       1     NaN   1        20
1       1       1     NaN   2        23
2       1       1     NaN   3        19
3       0       0     NaN   4        19
4       1     NaN       0   5         8
5       0     NaN       1   6        13
6       1     NaN       1   7        20
7       1     NaN       1   8        25

by passing axis=0 here you are stacking the df's on top of each other which I believe is what you want then producing NaN value where they are absent from their respective dfs.

这篇关于 pandas 将两个数据框与不同的列合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆