在内存中保留大列表的替代方法(python) [英] Alternatives to keeping large lists in memory (python)

查看:107
本文介绍了在内存中保留大列表的替代方法(python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我在python中有一个列表(或数组,字典...),可能超出了可用的内存地址空间,(32位python)有哪些选项和相对速度? (除了不列出那么大的列表之外) 列表可能超出了内存,但我无法事先知道.一旦开始超过75%,我将不再希望将列表保留在内存中(或者无论如何都不会保留新项目),有没有办法在中途转换为基于文件的方法?

If I have a list(or array, dictionary....) in python that could exceed the available memory address space, (32 bit python) what are the options and there relative speeds? (other than not making a list that large) The list could exceed the memory but I have no way of knowing before hand. Once it starts exceeding 75% I would like to no longer keep the list in memory (or the new items anyway), is there a way to convert to a file based approach mid-stream?

最好的(快进和快出)文件存储选项是什么?

What are the best (speed in and out) file storage options?

只需存储一个简单的数字列表.无需随机访问第N个元素,只需执行append/pop类型的操作即可.

Just need to store a simple list of numbers. no need to random Nth element access, just append/pop type operations.

推荐答案

如果您的数字"是足够简单的数字(带符号或无符号整数,每个整数最多4个字节,或者每个浮点数为4或8个字节),我建议使用标准库 array 模块作为最佳方法使用二进制文件(对二进制R/W打开)将数百万个文件保留在内存中(虚拟阵列"的尖端"),将其余结构保留在磁盘上. array.array具有非常快的fromfiletofile方法,可以方便地来回移动数据.

If your "numbers" are simple-enough ones (signed or unsigned integers of up to 4 bytes each, or floats of 4 or 8 bytes each), I recommend the standard library array module as the best way to keep a few millions of them in memory (the "tip" of your "virtual array") with a binary file (open for binary R/W) backing the rest of the structure on disk. array.array has very fast fromfile and tofile methods to facilitate the moving of data back and forth.

也就是说,基本上,假设无符号长数字,例如:

I.e., basically, assuming for example unsigned-long numbers, something like:

import os

# no more than 100 million items in memory at a time
MAXINMEM = int(1e8)

class bigarray(object):
  def __init__(self):
    self.f = open('afile.dat', 'w+')
    self.a = array.array('L')
  def append(self, n):
    self.a.append(n)
    if len(self.a) > MAXINMEM:
      self.a.tofile(self.f)
      del self.a[:]
  def pop(self):
    if not len(self.a):
      try: self.f.seek(-self.a.itemsize * MAXINMEM, os.SEEK_END)
      except IOError: return self.a.pop()  # ensure normal IndexError &c
      try: self.a.fromfile(self.f, MAXINMEM)
      except EOFError: pass
      self.f.seek(-self.a.itemsize * MAXINMEM, os.SEEK_END)
      self.f.truncate()
    return self.a.pop()

当然,您可以根据需要添加其他方法(例如,跟踪总长度,添加extend等),但是如果popappend确实是您所需要的,那么这应该满足.

Of course you can add other methods as necessary (e.g. keep track of the overall length, add extend, whatever), but if pop and append are indeed all you need this should serve.

这篇关于在内存中保留大列表的替代方法(python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆