在Python Pandas DataFrame中保留列顺序 [英] Preserving column order in Python Pandas DataFrame

查看:888
本文介绍了在Python Pandas DataFrame中保留列顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用Python Pandas进行读写时,是否可以保留csv文件中列的顺序?例如,在这段代码中

Is there a way to preserve the order of the columns in a csv file when read and the write with Python Pandas? For example, in this code

import pandas as pd

data = pd.read_csv(filename)
data.to_csv(filename)

输出文件可能会有所不同,因为未保留列.

the output files might be different because the columns are not preserved.

推荐答案

当前版本的Pandas('0.11.0')中似乎存在一个错误,这意味着Matti John的答案将不起作用.如果您指定要写入文件的列,则它们将按字母顺序书写,但只需根据cols中的列表重新标记即可.例如,这段代码:

There appears to be a bug in the current version of Pandas ('0.11.0'), which means that Matti John's answer will not work. If you specify columns for writing to file, they are written in alphabetical order, but simply relabelled according to the list in cols. For example, this code:

import pandas
dfdict={}
dfdict["a"]=[1,2,3,4]
dfdict["b"]=[5,6,7,8]
dfdict["c"]=[9,10,11,12]
df=pandas.DataFrame(dfdict)
df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"])

导致此(错误的)输出:

results in this (incorrect) output:

    b   a   c
0   1   5   9
1   2   6   10
2   3   7   11
3   4   8   12

您可以通过执行以下命令检查已安装的熊猫版本:

You can check which version of pandas you have installed by executing:

pandas.version.version

to_csv的文档是此处

Documentation for to_csv is here

实际上,这似乎是一个已知的错误,并将在即将发布的版本(0.11.1)中修复:

Actually, it seems that this is a known bug and will be fixed in an upcoming release (0.11.1):

https://github.com/pydata/pandas/issues/3489

更新:尚未有新版本的熊猫,但是这里描述了一种解决方法,不需要使用其他版本的熊猫:

UPDATE: There still hasn't been a new release of pandas, but there is a workaround described here, which doesn't require using a different version of pandas:

github.com/pydata/pandas/issues/3454

因此,将上面的代码块的最后一行更改为以下内容将可以正常工作:

So changing the last line in the block of code above to the following will work correctly:

df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python')

更新看来,参数"cols"已被重命名为"columns",而参数"engine"在最近的熊猫版本中已被弃用(不再可用).此外,此错误已在版本0.19.0中修复.

UPDATE it seems that the argument "cols" has been renamed to "columns" and that the argument "engine" is deprecated (no longer available) in recent versions of pandas. Also, this bug is fixed in version 0.19.0.

这篇关于在Python Pandas DataFrame中保留列顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆