如何在保留列顺序的同时创建DataFrame? [英] How to create a DataFrame while preserving order of the columns?

查看:83
本文介绍了如何在保留列顺序的同时创建DataFrame?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在保留列顺序的同时从多个numpy数组,Pandas系列或Pandas DataFrame创建一个DataFrame?

How can I create a DataFrame from multiple numpy arrays, Pandas Series, or Pandas DataFrame's while preserving the order of the columns?

例如,我有这两个numpy数组,我想将它们组合为Pandas DataFrame.

For example, I have these two numpy arrays and I want to combine them as a Pandas DataFrame.

foo = np.array( [ 1, 2, 3 ] )
bar = np.array( [ 4, 5, 6 ] )

如果执行此操作,则bar列将排在第一位,因为dict不会保留顺序.

If I do this, the bar column would come first because dict doesn't preserve order.

pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } )

    bar foo
0   4   1
1   5   2
2   6   3

我可以做到,但是当我需要组合许多变量时,这会变得很乏味.

I can do this, but it gets tedious when I need to combine many variables.

pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) }, columns = [ 'foo', 'bar' ] )

有没有一种方法可以指定要连接的变量并在一个操作中组织列顺序?也就是说,我不介意使用多行代码来完成整个操作,但是我宁愿不必指定要多次连接的变量(因为我将对代码进行很多更改,这很容易出错)

Is there a way to specify the variables to be joined and to organize the column order in one operation? That is, I don't mind using multiple lines to complete the entire operation, but I'd rather not having to specify the variables to be joined multiple times (since I will be changing the code a lot and this is pretty error prone).

还有一点.如果要添加或删除要连接的变量之一,则只想在一个位置添加/删除.

One more point. If I want to add or remove one of the variables to be joined, I only want to add/remove in one place.

推荐答案

原始解决方案:collections.OrderedDict

的使用不正确

在我最初的解决方案中,我建议使用python标准库中collections包中的OrderedDict.

Original Solution: Incorrect Usage of collections.OrderedDict

In my original solution, I proposed to use OrderedDict from the collections package in python's standard library.

>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> foo = np.array( [ 1, 2, 3 ] )
>>> bar = np.array( [ 4, 5, 6 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } ) )

   foo  bar
0    1    4
1    2    5
2    3    6

正确的解决方案:传递键值元组对以保存订单

但是,如上所述,如果将普通词典传递给OrderedDict,则由于构建词典时该顺序是随机的,因此该顺序可能仍未保留.但是,一种变通方法是按照OrderedDict >此SO帖子:

Right Solution: Passing Key-Value Tuple Pairs for Order Preservation

However, as noted, if a normal dictionary is passed to OrderedDict, the order may still not be preserved since the order is randomized when constructing the dictionary. However, a work around is to convert a list of key-value tuple pairs into an OrderedDict, as suggested from this SO post:

>>> import numpy as np
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> a = np.array( [ 1, 2, 3 ] )
>>> b = np.array( [ 4, 5, 6 ] )
>>> c = np.array( [ 7, 8, 9 ] )
>>>
>>> pd.DataFrame( OrderedDict( { 'a': pd.Series(a), 'b': pd.Series(b), 'c': pd.Series(c) } ) )

   a  c  b
0  1  7  4
1  2  8  5
2  3  9  6

>>> pd.DataFrame( OrderedDict( (('a', pd.Series(a)), ('b', pd.Series(b)), ('c', pd.Series(c))) ) )

   a  b  c
0  1  4  7
1  2  5  8
2  3  6  9

这篇关于如何在保留列顺序的同时创建DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆