读取〜13000行CSV文件的部分，包含pandas read_csv和nrows [英] Reading parts of ~13000 row CSV file with pandas read_csv and nrows

查看：489 发布时间：2017/2/24 17:52:51 python csv python-3.x pandas

本文介绍了读取〜13000行CSV文件的部分，包含pandas read_csv和nrows的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图读取一个CSV文件的段到一个pandas的DataFrame，我遇到麻烦，当我设置nrows超过某一点。我的CSV文件被分成不同的段，具有不同的标题/数据类型，所以我已经浏览了文件，找到不同段的行号，并保存行号。当我尝试做：

  pd.io.parsers.read_csv（'filename'，skiprows = 40，nrows = 12646）

它工作正常。任何更多的行，它会抛出一个错误：

  CParserError： C错误：第13897行中的56个字段，锯71

这是真的，13897行有许多行，这就是为什么我试图使用nrows和skiprows。我可以找到大熊猫将读取的最后一行，它看起来没有任何不同于其余。在十六进制编辑器中查看文件我仍然没有看到任何差异。

我也试过另一个CSV文件，我得到类似的结果： / p>

  pd.io.parsers.read_csv（'file2'，skiprows = 112，nrows = 18524）
 
< class'pandas.core.frame.DataFrame'> 
 Int64Index：18188 entries，0 to 18187

但是：

  pd.io.parsers.read_csv（'file2'，skiprows = 112，nrows = 18525）
  pre> 
 
 给出：
  CParserError： C错误：第19190行中的56个字段，锯71 
  
还有其他方法吗？
 
 
 我使用的是： pandas-0.10.1.win-amd64-py3.3 ， numpy-MKL-1.7.1rc1.win-amd64-py3.3 和 python-3.3.0.amd64 在Windows上。我得到与 numpy-unoptimized-1.7.1rc1.win-amd64-py3.3 相同的问题。
解决方案
您可以使用 warn_bad_lines 和 error_bad_lines 关闭坏行错误&警告：
  import pandas as pd 
来自StringIO import StringIO 
 data = StringIO（a ，b，c 
 1,2,3 
 4,5,6 
 6,7,8,9 
 1,2,5 
 3,4， 5）
 pd.read_csv（data，warn_bad_lines = False，error_bad_lines = False）
  
 
I'm trying to read segments of a CSV file into a pandas DataFrame, and I'm running into trouble when I set nrows to more than a certain point. My CSV file is split up into different segments with different headers/types of data, so I've gone through the file and found the line numbers of the different segments, and saved the line numbers. When I try to do:
pd.io.parsers.read_csv('filename',skiprows=40, nrows=12646)
It works fine. Any more rows, and it throws an error:
CParserError: Error tokenizing data. C error: Expected 56 fields in line 13897, saw 71
It's true that line 13897 has that many rows, that's why I'm trying to use nrows and skiprows. I can find the last row that pandas will read and it doesn't look any different from the rest. Looking at the file in a hex editor I still don't see any difference.


I've also tried it with another CSV file, and I get similar results:
pd.io.parsers.read_csv('file2',skiprows=112, nrows=18524)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18188 entries, 0 to 18187
But:
pd.io.parsers.read_csv('file2',skiprows=112, nrows=18525)
gives:
CParserError: Error tokenizing data. C error: Expected 56 fields in line 19190, saw 71
Is there something I'm missing? Is there another way to do this?

I'm using: pandas-0.10.1.win-amd64-py3.3, numpy-MKL-1.7.1rc1.win-amd64-py3.3, and python-3.3.0.amd64 on Windows. I get the same issue with numpy-unoptimized-1.7.1rc1.win-amd64-py3.3.
 解决方案 
You can use warn_bad_lines and error_bad_lines to turn off bad line error & warning:
import pandas as pd
from StringIO import StringIO
data = StringIO("""a,b,c
1,2,3
4,5,6
6,7,8,9
1,2,5
3,4,5""")
pd.read_csv(data, warn_bad_lines=False, error_bad_lines=False)


                        
这篇关于读取〜13000行CSV文件的部分，包含pandas read_csv和nrows的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

读取〜13000行CSV文件的部分，包含pandas read_csv和nrows [英] Reading parts of ~13000 row CSV file with pandas read_csv and nrows

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

读取〜13000行CSV文件的部分，包含pandas read_csv和nrows [英] Reading parts of ~13000 row CSV file with pandas read_csv and nrows

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭