pandas 的read_csv()1.2GB文件在具有140GB RAM的VM上的内存不足 [英] Pandas read_csv() 1.2GB file out of memory on VM with 140GB RAM

查看：74 发布时间：2020/5/24 1:59:18 python pandas

本文介绍了 pandas 的read_csv()1.2GB文件在具有140GB RAM的VM上的内存不足的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试读取1.2G的CSV文件，其中包含25K记录，每个记录都包含一个ID和一个大字符串.

I am trying to read a CSV file of 1.2G, which contains 25K records, each consists of a id and a large string.

但是，大约1万行时，出现此错误:

However, around 10K rows, I get this error:

pandas.io.common.CParserError:标记数据时出错. C错误:内存不足

pandas.io.common.CParserError: Error tokenizing data. C error: out of memory

这似乎很奇怪，因为VM具有140GB RAM，并且在1万行的情况下，内存使用率仅为〜1％.

Which seems weird, since the VM has 140GB RAM and at 10K rows the memory usage is only at ~1%.

这是我使用的命令:

pd.read_csv('file.csv', header=None, names=['id', 'text', 'code'])

我还运行了以下虚拟程序，该程序可以成功填满我的内存，接近100％.

I also ran the following dummy program, which could successfully fill up my memory to close to 100%.

list = []
list.append("hello")
while True:
    list.append("hello" + list[len(list) - 1])

推荐答案

这听起来像chunksize的工作.它将输入过程分为多个块，从而减少了所需的读取内存.

This sounds like a job for chunksize. It splits the input process into multiple chunks, reducing the required reading memory.

df = pd.DataFrame()
for chunk in pd.read_csv('Check1_900.csv', header=None, names=['id', 'text', 'code'], chunksize=1000):
    df = pd.concat([df, chunk], ignore_index=True)

这篇关于 pandas 的read_csv()1.2GB文件在具有140GB RAM的VM上的内存不足的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 的read_csv()1.2GB文件在具有140GB RAM的VM上的内存不足 [英] Pandas read_csv() 1.2GB file out of memory on VM with 140GB RAM

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 的read_csv()1.2GB文件在具有140GB RAM的VM上的内存不足 [英] Pandas read_csv() 1.2GB file out of memory on VM with 140GB RAM

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭