如何使用 Pandas 读取大型 csv 文件? [英] How do I read a large csv file with pandas?
问题描述
我试图在 Pandas 中读取一个大型 csv 文件(大约 6 GB),但出现内存错误:
I am trying to read a large csv file (aprox. 6 GB) in pandas and i am getting a memory error:
MemoryError Traceback (most recent call last)
<ipython-input-58-67a72687871b> in <module>()
----> 1 data=pd.read_csv('aphro.csv',sep=';')
...
MemoryError:
对此有帮助吗?
推荐答案
报错显示机器内存不足,无法读取整个一次将 CSV 转换为 DataFrame.假设您不需要整个数据集内存一次,避免问题的一种方法是 处理 CSV块(通过指定chunksize
参数):
The error shows that the machine does not have enough memory to read the entire
CSV into a DataFrame at one time. Assuming you do not need the entire dataset in
memory all at one time, one way to avoid the problem would be to process the CSV in
chunks (by specifying the chunksize
parameter):
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
process(chunk)
chunksize
参数指定每个块的行数.(当然,最后一个块可能包含少于 chunksize
行.)
The chunksize
parameter specifies the number of rows per chunk.
(The last chunk may contain fewer than chunksize
rows, of course.)
read_csv
和 chunksize
返回一个上下文管理器,像这样使用:
read_csv
with chunksize
returns a context manager, to be used like so:
chunksize = 10 ** 6
with pd.read_csv(filename, chunksize=chunksize) as reader:
for chunk in reader:
process(chunk)
参见 GH38225
这篇关于如何使用 Pandas 读取大型 csv 文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!