如何在保留列顺序的同时创建DataFrame? [英] How to create a DataFrame while preserving order of the columns?
问题描述
如何在保留列顺序的同时从多个numpy
数组,Pandas
系列或Pandas
DataFrame创建一个DataFrame?
How can I create a DataFrame from multiple numpy
arrays, Pandas
Series, or Pandas
DataFrame's while preserving the order of the columns?
例如,我有这两个numpy
数组,我想将它们组合为Pandas
DataFrame.
For example, I have these two numpy
arrays and I want to combine them as a Pandas
DataFrame.
foo = np.array( [ 1, 2, 3 ] )
bar = np.array( [ 4, 5, 6 ] )
如果执行此操作,则bar
列将排在第一位,因为dict
不会保留顺序.
If I do this, the bar
column would come first because dict
doesn't preserve order.
pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } )
bar foo
0 4 1
1 5 2
2 6 3
我可以做到,但是当我需要组合许多变量时,这会变得很乏味.
I can do this, but it gets tedious when I need to combine many variables.
pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) }, columns = [ 'foo', 'bar' ] )
有没有一种方法可以指定要连接的变量并在一个操作中组织列顺序?也就是说,我不介意使用多行代码来完成整个操作,但是我宁愿不必指定要多次连接的变量(因为我将对代码进行很多更改,这很容易出错)
Is there a way to specify the variables to be joined and to organize the column order in one operation? That is, I don't mind using multiple lines to complete the entire operation, but I'd rather not having to specify the variables to be joined multiple times (since I will be changing the code a lot and this is pretty error prone).
还有一点.如果要添加或删除要连接的变量之一,则只想在一个位置添加/删除.
One more point. If I want to add or remove one of the variables to be joined, I only want to add/remove in one place.
推荐答案
原始解决方案:collections.OrderedDict
的使用不正确
在我最初的解决方案中,我建议使用python标准库中collections
包中的OrderedDict
.
Original Solution: Incorrect Usage of collections.OrderedDict
In my original solution, I proposed to use OrderedDict
from the collections
package in python's standard library.
>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> foo = np.array( [ 1, 2, 3 ] )
>>> bar = np.array( [ 4, 5, 6 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } ) )
foo bar
0 1 4
1 2 5
2 3 6
正确的解决方案:传递键值元组对以保存订单
但是,如上所述,如果将普通词典传递给OrderedDict
,则由于构建词典时该顺序是随机的,因此该顺序可能仍未保留.但是,一种变通方法是按照
Right Solution: Passing Key-Value Tuple Pairs for Order Preservation
However, as noted, if a normal dictionary is passed to OrderedDict
, the order may still not be preserved since the order is randomized when constructing the dictionary. However, a work around is to convert a list of key-value tuple pairs into an OrderedDict
, as suggested from this SO post:
>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> a = np.array( [ 1, 2, 3 ] )
>>> b = np.array( [ 4, 5, 6 ] )
>>> c = np.array( [ 7, 8, 9 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'a': pd.Series(a), 'b': pd.Series(b), 'c': pd.Series(c) } ) )
a c b
0 1 7 4
1 2 8 5
2 3 9 6
>>> pd.DataFrame( OrderedDict( (('a', pd.Series(a)), ('b', pd.Series(b)), ('c', pd.Series(c))) ) )
a b c
0 1 4 7
1 2 5 8
2 3 6 9
这篇关于如何在保留列顺序的同时创建DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!