在追加模式下保存numpy数组 [英] save numpy array in append mode
问题描述
是否可以保存一个numpy数组,将其附加到已经存在的npy文件中---类似于np.save(filename,arr,mode='a')
?
Is it possible to save a numpy array appending it to an already existing npy-file --- something like np.save(filename,arr,mode='a')
?
我有几个函数必须遍历大型数组的行.由于内存限制,我无法立即创建数组.为了避免一遍又一遍地创建行,我想创建每行一次并将其保存到文件中,然后将其追加到文件的上一行中.稍后,我可以在mmap_mode中加载npy文件,并在需要时访问切片.
I have several functions that have to iterate over the rows of a large array. I cannot create the array at once because of memory constrains. To avoid to create the rows over and over again, I wanted to create each row once and save it to file appending it to the previous row in the file. Later I could load the npy-file in mmap_mode, accessing the slices when needed.
推荐答案
内置的.npy
文件格式非常适合处理小型数据集,而无需依赖numpy
以外的外部模块.
The build-in .npy
file format is perfectly fine for working with small datasets, without relying on external modules other then numpy
.
但是,当您开始拥有大量数据时,首选使用旨在处理此类数据集的文件格式(例如HDF5)
However, when you start having large amounts of data, the use of a file format, such as HDF5, designed to handle such datasets, is to be preferred [1].
例如,以下是使用 PyTables
步骤1:创建可扩展的 EArray
存储
import tables
import numpy as np
filename = 'outarray.h5'
ROW_SIZE = 100
NUM_COLUMNS = 200
f = tables.open_file(filename, mode='w')
atom = tables.Float64Atom()
array_c = f.create_earray(f.root, 'data', atom, (0, ROW_SIZE))
for idx in range(NUM_COLUMNS):
x = np.random.rand(1, ROW_SIZE)
array_c.append(x)
f.close()
步骤2:将行追加到现有数据集(如果需要)
f = tables.open_file(filename, mode='a')
f.root.data.append(x)
第3步:读回一部分数据
f = tables.open_file(filename, mode='r')
print(f.root.data[1:10,2:20]) # e.g. read from disk only this part of the dataset
这篇关于在追加模式下保存numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!