连接具有不同列顺序的数据框 [英] concatenate dataframes with different column ordering
问题描述
我正在解析excel文件中的数据,结果DataFrame
的列可能对齐或可能不对齐我想堆叠几个解析的DataFrame
的基本DataFrame
.
I am parsing data from excel files and the columns of the resulting DataFrame
may or may not align to a base DataFrame
where I want to stack several parsed DataFrame
.
让我们从数据A
和基础DataFrame
df_A
中调用我解析的DataFrame
.
Lets call the DataFrame
I parse from data A
, and the base DataFrame
df_A
.
我读到了一个Excel脚本,结果是A=
I read an excel shee resulting in A=
Index AGUB AGUG MUEB MUEB SIL SIL SILB SILB
2012-01-01 00:00:00 0.00 0 0.00 50.78 0.00 0.00 0.00 0.00
2012-01-01 01:00:00 0.00 0 0.00 53.15 0.00 53.15 0.00 0.00
2012-01-01 02:00:00 0.00 0 0.00 0.00 53.15 53.15 53.15 53.15
2012-01-01 03:00:00 0.00 0 0.00 0.00 0.00 55.16 0.00 0.00
2012-01-01 04:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 05:00:00 48.96 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 06:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 07:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 08:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 09:00:00 52.28 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 10:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 11:00:00 36.93 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 12:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 13:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 50.00
2012-01-01 14:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 34.01
2012-01-01 15:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 16:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 17:00:00 53.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 18:00:00 0.00 75 0.00 75.00 0.00 75.00 0.00 0.00
2012-01-01 19:00:00 0.00 70 0.00 70.00 0.00 0.00 0.00 0.00
2012-01-01 20:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 21:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 22:00:00 0.00 0 0.00 0.00 0.00 0.00 0.00 0.00
2012-01-01 23:00:00 0.00 0 53.45 53.45 0.00 0.00 0.00 0.00
我创建基本数据框:
units = ['MUE', 'MUEB', 'SIL', 'SILB', 'AGUG', 'AGUB', 'MUEBP', 'MUELP']
df_A = pd.DataFrame(columns=units)
df_A = pd.concat([df_A, A], axis=0)
通常使用concat
,如果A
的列少于df_A
会没事,但是在这种情况下,列的唯一区别是顺序.串联会导致以下错误:
Usually with concat
if A
had less columns than df_A
it'll be fine, but in this case the only difference in the columns is the order. the concatenation leads to the following error:
ValueError:平面形状未对齐
ValueError: Plan shapes are not aligned
我想知道如何用df_A
给出的列顺序连接两个数据框.
I'd like to know how to concatenate the two dataframes with the column order given by df_A
.
推荐答案
我已经尝试过了,并且源或目标定义的DataFrame中是否有更多列都没有关系-无论哪种方式,结果都是一个dataframe由所有提供的列的并集组成(目标中指定了空列,但未填充由NaN
填充的源).
I've tried this and it doesn't matter whether there are more columns in the source, or target defined DataFrame - either way, the result is a dataframe that consists of a union of all supplied columns (with empty columns specified in the target, but not populated by the source populated with NaN
).
我能够重现您的错误的地方是源或目标数据框中的列名称都包含重复名称(或空列名称).
Where I have been able to reproduce your error is where the column names in either the source or target dataframe include a duplicate name (or empty column names).
在您的示例中,各种列在源文件中多次出现.我认为concat不能很好地应对这类重复的列.
In your example, various columns appear more than once in your source file. I don't think concat copes very well with these kinds of duplicate columns.
import pandas as pd
s1 = [0,1,2,3,4,5]
s2 = [0,0,0,0,1,1]
A = pd.DataFrame([s2,s1],columns=['A','B','C','D','E','F'])
结果:
A B C D E F
-----------
0 0 0 0 1 1
0 1 2 3 4 5
获取列的子集,并使用它们创建一个名为B的新数据框.
Take a subset of columns and use them to create a new dataframe called B
B = A[['A','C','E']]
A C E
-----
0 0 1
0 2 4
创建一个新的空目标数据框
Create a new empty target dataframe
col_names = ['D','A','C','B']
Z = pd.DataFrame(columns=col_names)
D A C B
-------
并连接两个:
Z = pd.concat([B,Z],axis=0)
A C D E
0 0 NaN 1
0 2 NaN 4
很好!
但是如果我这样使用列重新创建空的数据框:
But if I recreate the empty dataframe using columns as so:
col_names = ['D','A','C','D']
Z = pd.DataFrame(columns=col_names)
D A C D
并尝试串联:
col_names = ['D','A','C','D']
Z = pd.DataFrame(columns=col_names)
然后我得到您描述的错误.
Then I get the error you describe.
这篇关于连接具有不同列顺序的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!