在磁盘上保留 numpy 数组的最佳方法 [英] best way to preserve numpy arrays on disk
问题描述
我正在寻找一种保存大型 numpy 数组的快速方法.我想以二进制格式将它们保存到磁盘,然后相对较快地将它们读回内存.不幸的是,cPickle 不够快.
I am looking for a fast way to preserve large numpy arrays. I want to save them to the disk in a binary format, then read them back into memory relatively fastly. cPickle is not fast enough, unfortunately.
我发现了 numpy.savez 和 numpy.load.但奇怪的是, numpy.load 将一个 npy 文件加载到内存映射"中.这意味着对数组的常规操作真的很慢.例如,这样的事情会很慢:
I found numpy.savez and numpy.load. But the weird thing is, numpy.load loads a npy file into "memory-map". That means regular manipulating of arrays really slow. For example, something like this would be really slow:
#!/usr/bin/python
import numpy as np;
import time;
from tempfile import TemporaryFile
n = 10000000;
a = np.arange(n)
b = np.arange(n) * 10
c = np.arange(n) * -0.5
file = TemporaryFile()
np.savez(file,a = a, b = b, c = c);
file.seek(0)
t = time.time()
z = np.load(file)
print "loading time = ", time.time() - t
t = time.time()
aa = z['a']
bb = z['b']
cc = z['c']
print "assigning time = ", time.time() - t;
更准确地说,第一行会非常快,但剩余的将数组分配给 obj
的行非常慢:
more precisely, the first line will be really fast, but the remaining lines that assign the arrays to obj
are ridiculously slow:
loading time = 0.000220775604248
assining time = 2.72940087318
有没有更好的方法来保存 numpy 数组?理想情况下,我希望能够在一个文件中存储多个数组.
Is there any better way of preserving numpy arrays? Ideally, I want to be able to store multiple arrays in one file.
推荐答案
我非常喜欢用 hdf5 来存储大型 numpy 数组.python中处理hdf5有两种选择:
I'm a big fan of hdf5 for storing large numpy arrays. There are two options for dealing with hdf5 in python:
两者都旨在有效地处理 numpy 数组.
Both are designed to work with numpy arrays efficiently.
这篇关于在磁盘上保留 numpy 数组的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!