在内存中保留大列表的替代方法(python) [英] Alternatives to keeping large lists in memory (python)
问题描述
如果我在python中有一个列表(或数组,字典...),可能超出了可用的内存地址空间,(32位python)有哪些选项和相对速度? (除了不列出那么大的列表之外) 列表可能超出了内存,但我无法事先知道.一旦开始超过75%,我将不再希望将列表保留在内存中(或者无论如何都不会保留新项目),有没有办法在中途转换为基于文件的方法?
If I have a list(or array, dictionary....) in python that could exceed the available memory address space, (32 bit python) what are the options and there relative speeds? (other than not making a list that large) The list could exceed the memory but I have no way of knowing before hand. Once it starts exceeding 75% I would like to no longer keep the list in memory (or the new items anyway), is there a way to convert to a file based approach mid-stream?
最好的(快进和快出)文件存储选项是什么?
What are the best (speed in and out) file storage options?
只需存储一个简单的数字列表.无需随机访问第N个元素,只需执行append/pop类型的操作即可.
Just need to store a simple list of numbers. no need to random Nth element access, just append/pop type operations.
推荐答案
如果您的数字"是足够简单的数字(带符号或无符号整数,每个整数最多4个字节,或者每个浮点数为4或8个字节),我建议使用标准库 array 模块作为最佳方法使用二进制文件(对二进制R/W打开)将数百万个文件保留在内存中(虚拟阵列"的尖端"),将其余结构保留在磁盘上. array.array
具有非常快的fromfile
和tofile
方法,可以方便地来回移动数据.
If your "numbers" are simple-enough ones (signed or unsigned integers of up to 4 bytes each, or floats of 4 or 8 bytes each), I recommend the standard library array module as the best way to keep a few millions of them in memory (the "tip" of your "virtual array") with a binary file (open for binary R/W) backing the rest of the structure on disk. array.array
has very fast fromfile
and tofile
methods to facilitate the moving of data back and forth.
也就是说,基本上,假设无符号长数字,例如:
I.e., basically, assuming for example unsigned-long numbers, something like:
import os
# no more than 100 million items in memory at a time
MAXINMEM = int(1e8)
class bigarray(object):
def __init__(self):
self.f = open('afile.dat', 'w+')
self.a = array.array('L')
def append(self, n):
self.a.append(n)
if len(self.a) > MAXINMEM:
self.a.tofile(self.f)
del self.a[:]
def pop(self):
if not len(self.a):
try: self.f.seek(-self.a.itemsize * MAXINMEM, os.SEEK_END)
except IOError: return self.a.pop() # ensure normal IndexError &c
try: self.a.fromfile(self.f, MAXINMEM)
except EOFError: pass
self.f.seek(-self.a.itemsize * MAXINMEM, os.SEEK_END)
self.f.truncate()
return self.a.pop()
当然,您可以根据需要添加其他方法(例如,跟踪总长度,添加extend
等),但是如果pop
和append
确实是您所需要的,那么这应该满足.
Of course you can add other methods as necessary (e.g. keep track of the overall length, add extend
, whatever), but if pop
and append
are indeed all you need this should serve.
这篇关于在内存中保留大列表的替代方法(python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!