尾部定界符使大 pandas 感到困惑read_csv [英] Trailing delimiter confuses pandas read_csv
问题描述
一个csv(逗号分隔)文件,其中的行带有一个额外的结尾定界符,似乎使pandas.read_csv
感到困惑. (数据文件为[1])
A csv (comma delimited) file, where lines have an extra trailing delimiter, seems to confuse pandas.read_csv
. (The data file is [1])
它将额外的定界符视为存在额外的列.因此,除了标题所需的内容之外,还有一列.然后pandas.read_csv
将第一列作为行标签.总体效果是,列和标题不再对齐-第一列成为行标签,第二列由第一个标题命名,等等.
It treats the extra delimiter as if there's an extra column. So there's one more column than what headers require. Then pandas.read_csv
takes the first column as row labels. The overall effect is that columns and headers are not aligned any more - the first column becomes row labels, the second column is named by first header, etc.
这很烦人.知道如何告诉pandas.read_csv
做正确的事吗?我找不到一个.
It is quite annoying. Any idea how to tell pandas.read_csv
do the right thing? I couldn't find one.
好书,顺便说一句.
[1]:2012年FEC选举数据库,该书来自 Python for Data Analysis
[1]: 2012 FEC Election Database from chapter 9 of the book Python for Data Analysis
推荐答案
我创建了一个GitHub问题来看看如何自动处理此问题:
I created a GitHub issue to have a look at handling this issue automatically:
https://github.com/pydata/pandas/issues/2442
我认为FEC文件格式略有变化,导致此烦人的问题-如果您使用此处发布的文件 http ://github.com/pydata/pydata-book 希望您不会遇到这个问题.
I think the FEC file format changed slightly causing this annoying issue-- if you use the one posted here http://github.com/pydata/pydata-book you hopefully won't have that problem.
这篇关于尾部定界符使大 pandas 感到困惑read_csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!