在追加模式下保存numpy数组 [英] save numpy array in append mode

查看:2063
本文介绍了在追加模式下保存numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以保存一个numpy数组,将其附加到已经存在的npy文件中---类似于np.save(filename,arr,mode='a')?

Is it possible to save a numpy array appending it to an already existing npy-file --- something like np.save(filename,arr,mode='a')?

我有几个函数必须遍历大型数组的行.由于内存限制,我无法立即创建数组.为了避免一遍又一遍地创建行,我想创建每行一次并将其保存到文件中,然后将其追加到文件的上一行中.稍后,我可以在mmap_mode中加载npy文件,并在需要时访问切片.

I have several functions that have to iterate over the rows of a large array. I cannot create the array at once because of memory constrains. To avoid to create the rows over and over again, I wanted to create each row once and save it to file appending it to the previous row in the file. Later I could load the npy-file in mmap_mode, accessing the slices when needed.

推荐答案

内置的.npy文件格式非常适合处理小型数据集,而无需依赖numpy以外的外部模块.

The build-in .npy file format is perfectly fine for working with small datasets, without relying on external modules other then numpy.

但是,当您开始拥有大量数据时,首选使用旨在处理此类数据集的文件格式(例如HDF5)

However, when you start having large amounts of data, the use of a file format, such as HDF5, designed to handle such datasets, is to be preferred [1].

例如,以下是使用 PyTables

步骤1:创建可扩展的 EArray 存储

import tables
import numpy as np

filename = 'outarray.h5'
ROW_SIZE = 100
NUM_COLUMNS = 200

f = tables.open_file(filename, mode='w')
atom = tables.Float64Atom()

array_c = f.create_earray(f.root, 'data', atom, (0, ROW_SIZE))

for idx in range(NUM_COLUMNS):
    x = np.random.rand(1, ROW_SIZE)
    array_c.append(x)
f.close()

步骤2:将行追加到现有数据集(如果需要)

f = tables.open_file(filename, mode='a')
f.root.data.append(x)

第3步:读回一部分数据

f = tables.open_file(filename, mode='r')
print(f.root.data[1:10,2:20]) # e.g. read from disk only this part of the dataset

这篇关于在追加模式下保存numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆