将不同列数的多个数据帧合并为一个大数据帧 [英] Merge of multiple data frames of different number of columns into one big data frame
问题描述
我有两个CSV文件,具有不同的列和行数.第一个CSV文件具有M列和N行,第二个CSV文件具有H列和G行.一些列具有相同的名称.
I have two CSV files with different number of columns and rows. The first CSV file has M columns and N rows, the second has H columns and G rows. Some of the columns have the same name.
我想将两者结合到具有以下属性的数据框中:
I'd like to combine the two into data frame with following properties:
- N + G行
- (M,H)列的联合
- 如果列A是第一个CSV文件的元素,而不是第二个CSV文件的元素,则数据框在A的前N个条目中应包含与第一个CSV相同的值,对于其余列(因为第二个CSV中没有A数据) )应为NA.
这里是一个例子:
CSV1
City, Population,
Zagreb, 700000,
Rijeka, 142000
CSV2
City, Area,
Split, 200.00
Osijek, 171.00
Dubrovnik, 143.35
我想构建一个看起来像这样的数据框:
I'd like build a data frame that looks like this:
City Population Area
Zagreb 700000 NA
Rijeka 142000 NA
Split NA 200.00
Osijek NA 171.00
Dubrovnik NA 143.35
如果两个CSV文件都具有两个数据框并希望执行相同操作,例如,如果我将第一个csv加载到df1
,然后将第二个csv加载到df2
中,然后又想合并到
Also what if instead two CSV files I had two data frames and wanted to do the same, for example if I loaded first csv to df1
and second one in df2
and then wanted to make a merge to df3
that would look like example above.
推荐答案
为什么不尝试 concat
函数:
Why not try the concat
function:
In [25]: df1
Out[25]:
City Population
0 Zagreb 700000
1 Rijeka 142000
In [26]: df2
Out[26]:
City Area
0 Split 200.00
1 Osijek 171.00
2 Dubrovnik 143.35
In [27]: pd.concat([df1,df2])
Out[27]:
Area City Population
0 NaN Zagreb 700000
1 NaN Rijeka 142000
0 200.00 Split NaN
1 171.00 Osijek NaN
2 143.35 Dubrovnik NaN
In [28]: pd.concat([df1,df2], ignore_index=True)
Out[28]:
Area City Population
0 NaN Zagreb 700000
1 NaN Rijeka 142000
2 200.00 Split NaN
3 171.00 Osijek NaN
4 143.35 Dubrovnik NaN
注意: 查看全文