为什么pandas read_csv没有读取正确的行数? [英] Why is pandas read_csv not reading the right number of rows?

查看:2309
本文介绍了为什么pandas read_csv没有读取正确的行数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用pandas read_csv打开一个csv文件的一部分。

  gr = read_csv(inputfile,header = 746,nrows = 374,index_col = False)

然后我得到一个错误

  CParserError:错误标记数据。 C错误:第1121行中的第9个字段,锯出17 

错误是有意义的,因为在第1121行的文件,数据从9个字段更改为17.没有意义的是为什么它试图读取行1121,因为nrows和标题应该只打开到1120的行。



我可以通过减少行数到232.这仍然可以工作,即使我增加头号码,因此它开始进一步向下的文件(例如增加到800)。

对于最后一行它看起来没有什么特别的,如果我增加头号,它会在文件中进一步读取。



我使用的是Python 2.7和pandas 0.14。



我想读取的文件类似:

 River Levels,GRETA_SOUTH(C),GLENROWAN(C),ROCKY_POINT(C),DOCKER_RD ,BOBINAWARRAH(C),WOOLSHED(C),WANGARATTA(C),PEECHELBA_EAST(C)
41812.00001,0.70,0.00,0.00,0.20,0.00,0.00,7.30,125.00
41812.04168,0.70,0.00,0.00,0.20,0.00,0.00,7.30,125.00

为什么它试图打开行1121,当nrows +头小于这个,为什么它只读取232行之前,它呢?

解决方案

除非我正在读文档错误,这看起来像一个bug在 read_csv (我建议填写一个问题在github!)。



一种解决方法,因为您的数据很小(以字符串形式读取):

  from StringIO import StringIO 
with open(inputfile)as f:
df = pd.read_csv(StringIO(''。join(f.readlines()[:1120])),header = 746 ,nrows = 374)

我测试了这个与你提供的csv和它工作/ 't raise!


I'm trying to open part of a csv file using pandas read_csv. The section I am opening has a header on line 746, and goes to line 1120.

 gr = read_csv(inputfile,header=746,nrows=374,index_col=False)

I then get an error

CParserError: Error tokenizing data. C error: Expected 9 fields in line 1121, saw 17

The error makes sense, because in line 1121 of the file, the data changes from 9 fields to 17. What doesn't make sense is why it is trying to read line 1121, as the nrows and header should only open lines up to 1120.

I can get it to work by decreasing the number of rows to below 232. This still works even if I increase the header number so it starts further down the file (eg increase it to 800).

There doesn't seem to be anything special about the last line it will read, and it will read lines further in the file if I increase the header number.

I am using Python 2.7 and pandas 0.14.

The file I am trying to read looks like:

"River Levels","GRETA_SOUTH      (C)","GLENROWAN        (C)","ROCKY_POINT      (C)","DOCKER_RD        (C)","BOBINAWARRAH     (C)","WOOLSHED         (C)","WANGARATTA       (C)","PEECHELBA_EAST   (C)"
 41812.00001,          0.70,          0.00,          0.00,          0.20,          0.00,          0.00,          7.30,        125.00
 41812.04168,          0.70,          0.00,          0.00,          0.20,          0.00,          0.00,          7.30,        125.00

Why is it trying to open line 1121, when nrows+header is less than this, and why will it only read 232 lines before it does this?

解决方案

Unless I'm reading the docs wrong this looks like a bug in read_csv (I recommend filling an issue on github!).

A workaround, since your data is smallish (read in the lines as a string):

from StringIO import StringIO
with open(inputfile) as f:
    df = pd.read_csv(StringIO(''.join(f.readlines()[:1120])), header=746, nrows=374)

I tested this with the csv you provide and it works/doesn't raise!

这篇关于为什么pandas read_csv没有读取正确的行数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆