Pandas Python:连接具有相同列的数据框 [英] Pandas Python: Concatenate dataframes having same columns

查看:69
本文介绍了Pandas Python:连接具有相同列的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有3个数据框,它们的列名彼此相同.说:

I have 3 dataframes having the same column names as each other. Say :

df1
column1   column2   column3
a         b         c
d         e         f


df2
column1   column2   column3
g         h         i
j         k         l


df3
column1   column2   column3
m         n         o
p         q         r

每个数据框具有不同的值,但列相同.我尝试了append和concat,以及合并外部但有错误.这是我尝试过的:

Each dataframe has different values but the same columns. I tried append and concat, as well as merge outer but had errors. Here's what I tried:

df_final = df1.append(df2,sort = True,ignore_index = True).append2(df3,sort = True,ignore_index = True)

我也尝试过: df_final = pd.concat([df1,df2,df3],axis = 1)

但是我得到这个错误: AssertionError:管理者项目的数量必须等于块项目的并集#管理者项目:61,#tot_items:62

But I get this error: AssertionError: Number of manager items must equal union of block items# manager items: 61, # tot_items: 62

我已经搜索了该错误,但似乎无法理解为什么发生这种情况.任何指导,不胜感激!

I've googled the error but I can't seem to understand why it's happening in my case. Any guidance is much appreciated!

推荐答案

我认为某些或所有DataFrame中的列名重复存在问题.

I think there is problem with duplicated columns names in some or all DataFrames.

#simulate error
df1.columns = ['column3','column1','column1']
df2.columns = ['column5','column1','column1']
df3.columns = ['column2','column1','column1']

df_final = pd.concat([df1, df2, df3])

AssertionError:管理器项的数量必须等于块项的并集#个管理者项目:4,#个tot_items:5

AssertionError: Number of manager items must equal union of block items # manager items: 4, # tot_items: 5

您可以找到重复的列名称:

You can find duplicated columns names:

print (df3.columns[df3.columns.duplicated(keep=False)])
Index(['column1', 'column1'], dtype='object')


可能的解决方案是按列表设置列名:


Possible solutions is set columns names by list:

df3.columns = ['column1','column2','column3']
print (df3)
  column1 column2 column3
0       m       n       o
1       p       q       r

或删除具有重复名称的重复列:

Or remove duplicated columns with dupe names:

df31 = df3.loc[:, ~df3.columns.duplicated()]
print (df31)
  column2 column1
0       m       n
1       p       q

然后 concat append 应该可以正常工作.

Then concat or append should working nice.

这篇关于Pandas Python:连接具有相同列的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆