Python将2GB的文本文件加载到内存中 [英] Python load 2GB of text file to memory

查看:205
本文介绍了Python将2GB的文本文件加载到内存中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Python 2.7中,当我将2.5GB的文本文件中的所有数据加载到内存中以进行如下更快处理时:

In Python 2.7, when I load all data from a text file of 2.5GB into memory for quicker processing like this:

>>> f = open('dump.xml','r')
>>> dump = f.read()

我遇到以下错误:

Python(62813) malloc: *** mmap(size=140521659486208) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

为什么Python会尝试为2563749237个字节的数据分配140521659486208个字节的内存?如何修复代码以使其加载所有字节?

Why did Python try to allocate 140521659486208 bytes memory for 2563749237 bytes data? How do I fix the code to make it loads all the bytes?

我有大约3GB的可用RAM.该文件是Wiktionary xml转储.

I'm having around 3GB RAM free. The file is a Wiktionary xml dump.

推荐答案

如果您使用 mmap ,则将能够立即将整个文件加载到内存中.

If you use mmap, you'll be able to load the entire file into memory immediately.

import mmap

with open('dump.xml', 'rb') as f:
  # Size 0 will read the ENTIRE file into memory!
  m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) #File is open read-only

  # Proceed with your code here -- note the file is already in memory
  # so "readine" here will be as fast as could be
  data = m.readline()
  while data:
    # Do stuff
    data = m.readline()

这篇关于Python将2GB的文本文件加载到内存中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆