将大列表保存在内存中的替代方法 (python) [英] Alternatives to keeping large lists in memory (python)
问题描述
如果我在 python 中有一个列表(或数组、字典......),可能会超过可用的内存地址空间,(32 位 python)有哪些选项和相对速度?(除了不列出那么大的清单)列表可能超出内存,但我无法事先知道.一旦它开始超过 75%,我不想再将列表保留在内存中(或无论如何都是新项目),有没有办法在中游转换为基于文件的方法?
If I have a list(or array, dictionary....) in python that could exceed the available memory address space, (32 bit python) what are the options and there relative speeds? (other than not making a list that large) The list could exceed the memory but I have no way of knowing before hand. Once it starts exceeding 75% I would like to no longer keep the list in memory (or the new items anyway), is there a way to convert to a file based approach mid-stream?
最好的(速度输入和输出速度)文件存储选项是什么?
What are the best (speed in and out) file storage options?
只需要存储一个简单的数字列表.无需随机访问第 N 个元素,只需 append/pop 类型操作.
Just need to store a simple list of numbers. no need to random Nth element access, just append/pop type operations.
推荐答案
如果您的数字"足够简单(每个最多 4 个字节的有符号或无符号整数,或每个 4 或 8 个字节的浮点数),我推荐标准库 array 模块作为最好的方法使用二进制文件(为二进制 R/W 打开)支持磁盘上的其余结构,将数百万个它们保存在内存中(虚拟阵列"的尖端").array.array
有非常快的 fromfile
和 tofile
方法来方便数据的来回移动.
If your "numbers" are simple-enough ones (signed or unsigned integers of up to 4 bytes each, or floats of 4 or 8 bytes each), I recommend the standard library array module as the best way to keep a few millions of them in memory (the "tip" of your "virtual array") with a binary file (open for binary R/W) backing the rest of the structure on disk. array.array
has very fast fromfile
and tofile
methods to facilitate the moving of data back and forth.
即,基本上,假设例如无符号长数,例如:
I.e., basically, assuming for example unsigned-long numbers, something like:
import os
# no more than 100 million items in memory at a time
MAXINMEM = int(1e8)
class bigarray(object):
def __init__(self):
self.f = open('afile.dat', 'w+')
self.a = array.array('L')
def append(self, n):
self.a.append(n)
if len(self.a) > MAXINMEM:
self.a.tofile(self.f)
del self.a[:]
def pop(self):
if not len(self.a):
try: self.f.seek(-self.a.itemsize * MAXINMEM, os.SEEK_END)
except IOError: return self.a.pop() # ensure normal IndexError &c
try: self.a.fromfile(self.f, MAXINMEM)
except EOFError: pass
self.f.seek(-self.a.itemsize * MAXINMEM, os.SEEK_END)
self.f.truncate()
return self.a.pop()
当然您可以根据需要添加其他方法(例如跟踪总长度,添加extend
,等等),但是如果pop
和append
确实是您所需要的.
Of course you can add other methods as necessary (e.g. keep track of the overall length, add extend
, whatever), but if pop
and append
are indeed all you need this should serve.
这篇关于将大列表保存在内存中的替代方法 (python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!