Pandas.read_csv()内存错误 [英] Pandas.read_csv() MemoryError

查看:308
本文介绍了Pandas.read_csv()内存错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个1GB的CSV文件.该文件有大约10000000(10 Mil)行.我需要遍历行以获取几个选定行的最大值(基于条件).问题是正在读取csv文件.

I have a 1gb csv file. The file has about 10000000(10 Mil) rows. I need to iterate through the rows to get the max of a few selected rows(based on a condition). The issue is reading the csv file.

我将Pandas软件包用于Python. read_csv()函数在读取csv文件时会引发MemoryError. 1)我试图将文件拆分为多个块并读取它们,现在,concat()函数出现了内存问题.

I use the Pandas package for Python. The read_csv() function throws the MemoryError while reading the csv file. 1) I have tried to split the file into chunks and read them, Now, the concat() function has a memory issue.

tp  = pd.read_csv('capture2.csv', iterator=True, chunksize=10000, dtype={'timestamp': float, 'vdd_io_soc_i': float, 'vdd_io_soc_v': float,  'vdd_io_plat_i': float, 'vdd_io_plat_v': float, 'vdd_ext_flash_i': float,   'vdd_ext_flash_v': float,   'vsys_i vsys_v': float, 'vdd_aon_dig_i': float, 'vdd_aon_dig_v': float, 'vdd_soc_1v8_i': float, 'vdd_soc_1v8_v': float})

df = pd.concat(tp,ignore_index=True)

我已经使用dtype减少了内存消耗,但仍然没有改善.

I have used the dtype to reduce memory hog, still there is no improvement.

基于多个博客文章. 我已经将numpy和pandas更新为最新版本.还是没有运气.

Based on multiple blog posts. I have updated numpy, pandas all of them to the latest version. Still no luck.

如果有人可以解决此问题,那就太好了.

It would be great if anyone has a solution to this issue.

请注意:

  • 我有64位操作系统(Windows 7)

  • I have a 64bit operating system(Windows 7)

我正在运行Python 2.7.10(默认值,2015年5月23日,09:40:32)[MSC v.1500 32位]

I am running Python 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit]

我有4GB Ram.

Numpy最新(pip安装程序显示已安装最新版本)

Numpy latest (pip installer says latest version installed)

Pandas Latest.(pip安装程序显示已安装最新版本)

Pandas Latest.(pip installer says latest version installed)

推荐答案

熊猫read_csv()具有较低的内存标志.

Pandas read_csv() has a low memory flag.

tp  = pd.read_csv('capture2.csv',low_memory=True, ...)

low_memory标志仅在使用C解析器时可用

The low_memory flag is only available if you use the C parser

engine:{'c','python'},可选

engine : {‘c’, ‘python’}, optional

要使用的解析器引擎. C引擎速度更快,而python引擎当前功能更完善.

Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.

您还可以使用memory_map标志

You can also use the memory_map flag

memory_map:布尔值,默认为False

memory_map : boolean, default False

如果为filepath_or_buffer提供了文件路径,则将文件对象直接映射到内存中,然后直接从那里访问数据.使用此选项可以提高性能,因为不再有任何I/O开销.

If a filepath is provided for filepath_or_buffer, map the file object directly onto memory and access the data directly from there. Using this option can improve performance because there is no longer any I/O overhead.

p.s.使用64位python-请参阅我的评论

p.s. use 64bit python - see my comment

这篇关于Pandas.read_csv()内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆