pandas 将两个数据框与不同的列合并 [英] Pandas merge two dataframes with different columns
问题描述
> df_may
id数量attr_1 attr_2
0 1 20 0 1
1 2 23 1 1
2 3 19 1 1
3 4 19 0 0
> df_jun
id数量attr_1 attr_3
0 5 8 1 0
1 6 13 0 1
2 7 20 1 1
3 8 25 1 1
我尝试加入一个外连接:
mayjundf = pd .DataFrame.merge(df_may,df_jun,how =outer)
但是,左数据列不唯一:索引([....
$ p)
我还指定了一个列加入(on =id,例如),但是除了id以外的所有列都像attr_1_x,attr_1_y,不太理想,我也将整列列(有很多)传递给on:
mayjundf = pd.DataFrame.merge(df_may,df_jun,how =outer,on = list(df_may.columns.values))
其中产生:
ValueError:缓冲区的维数错误(预计1 ,得到2)
我缺少什么?我想得到一个附加了所有行的df,而attr_1,attr_2,attr_3填充在可能的地方,NaN在那里不显示。这似乎是一个非常典型的数据显示工作流程,但我被困住了。
提前感谢
解决方案我认为在这种情况下 concat
是你想要的:
在[12]中:
pd.concat([df,df1],axis = 0,ignore_index = True)
输出[12]:
attr_1 attr_2 attr_3 id数量
0 0 1 NaN 1 20
1 1 1 NaN 2 23
2 1 1 NaN 3 19
3 0 0 NaN 4 19
4 1 NaN 0 5 8
5 0 NaN 1 6 13
6 1 NaN 1 7 20
7 1 NaN 1 8 25
通过传递 axis = 0
这里你正在堆叠df的顶部,我相信是什么y ou然后生成 NaN
值,他们不在他们各自的dfs。
I'm surely missing something simple here. Trying to merge two dataframes in pandas that have mostly the same column names, but the right dataframe has some columns that the left doesn't have, and vice versa.
>df_may
id quantity attr_1 attr_2
0 1 20 0 1
1 2 23 1 1
2 3 19 1 1
3 4 19 0 0
>df_jun
id quantity attr_1 attr_3
0 5 8 1 0
1 6 13 0 1
2 7 20 1 1
3 8 25 1 1
I've tried joining with an outer join:
mayjundf = pd.DataFrame.merge(df_may, df_jun, how="outer")
But that yields:
Left data columns not unique: Index([....
I've also specified a single column to join on (on = "id", e.g.), but that duplicates all columns except "id" like attr_1_x, attr_1_y, which is not ideal. I've also passed the entire list of columns (there are many) to "on":
mayjundf = pd.DataFrame.merge(df_may, df_jun, how="outer", on=list(df_may.columns.values))
Which yields:
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
What am I missing? I'd like to get a df with all rows appended, and attr_1, attr_2, attr_3 populated where possible, NaN where they don't show up. This seems like a pretty typical workflow for data munging, but I'm stuck.
Thanks in advance.
解决方案 I think in this case concat
is what you want:
In [12]:
pd.concat([df,df1], axis=0, ignore_index=True)
Out[12]:
attr_1 attr_2 attr_3 id quantity
0 0 1 NaN 1 20
1 1 1 NaN 2 23
2 1 1 NaN 3 19
3 0 0 NaN 4 19
4 1 NaN 0 5 8
5 0 NaN 1 6 13
6 1 NaN 1 7 20
7 1 NaN 1 8 25
by passing axis=0
here you are stacking the df's on top of each other which I believe is what you want then producing NaN
value where they are absent from their respective dfs.
这篇关于 pandas 将两个数据框与不同的列合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!