将大列表保存在内存中的替代方法 (python) [英] Alternatives to keeping large lists in memory (python)

查看:24
本文介绍了将大列表保存在内存中的替代方法 (python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我在 python 中有一个列表(或数组、字典......),可能会超过可用的内存地址空间,(32 位 python)有哪些选项和相对速度?(除了不列出那么大的清单)列表可能超出内存,但我无法事先知道.一旦它开始超过 75%,我不想再将列表保留在内存中(或无论如何都是新项目),有没有办法在中游转换为基于文件的方法?

If I have a list(or array, dictionary....) in python that could exceed the available memory address space, (32 bit python) what are the options and there relative speeds? (other than not making a list that large) The list could exceed the memory but I have no way of knowing before hand. Once it starts exceeding 75% I would like to no longer keep the list in memory (or the new items anyway), is there a way to convert to a file based approach mid-stream?

最好的(速度输入和输出速度)文件存储选项是什么?

What are the best (speed in and out) file storage options?

只需要存储一个简单的数字列表.无需随机访问第 N 个元素,只需 append/pop 类型操作.

Just need to store a simple list of numbers. no need to random Nth element access, just append/pop type operations.

推荐答案

如果您的数字"足够简单(每个最多 4 个字节的有符号或无符号整数,或每个 4 或 8 个字节的浮点数),我推荐标准库 array 模块作为最好的方法使用二进制文件(为二进制 R/W 打开)支持磁盘上的其余结构,将数百万个它们保存在内存中(虚拟阵列"的尖端").array.array 有非常快的 fromfiletofile 方法来方便数据的来回移动.

If your "numbers" are simple-enough ones (signed or unsigned integers of up to 4 bytes each, or floats of 4 or 8 bytes each), I recommend the standard library array module as the best way to keep a few millions of them in memory (the "tip" of your "virtual array") with a binary file (open for binary R/W) backing the rest of the structure on disk. array.array has very fast fromfile and tofile methods to facilitate the moving of data back and forth.

即,基本上,假设例如无符号长数,例如:

I.e., basically, assuming for example unsigned-long numbers, something like:

import os

# no more than 100 million items in memory at a time
MAXINMEM = int(1e8)

class bigarray(object):
  def __init__(self):
    self.f = open('afile.dat', 'w+')
    self.a = array.array('L')
  def append(self, n):
    self.a.append(n)
    if len(self.a) > MAXINMEM:
      self.a.tofile(self.f)
      del self.a[:]
  def pop(self):
    if not len(self.a):
      try: self.f.seek(-self.a.itemsize * MAXINMEM, os.SEEK_END)
      except IOError: return self.a.pop()  # ensure normal IndexError &c
      try: self.a.fromfile(self.f, MAXINMEM)
      except EOFError: pass
      self.f.seek(-self.a.itemsize * MAXINMEM, os.SEEK_END)
      self.f.truncate()
    return self.a.pop()

当然您可以根据需要添加其他方法(例如跟踪总长度,添加extend,等等),但是如果popappend 确实是您所需要的.

Of course you can add other methods as necessary (e.g. keep track of the overall length, add extend, whatever), but if pop and append are indeed all you need this should serve.

这篇关于将大列表保存在内存中的替代方法 (python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆