如何将两个numpy数组以hdf5格式连接? [英] How to concatenate two numpy arrays in hdf5 format?

查看:101
本文介绍了如何将两个numpy数组以hdf5格式连接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在hdf5中存储了两个numpy数组,分别是 每个44 GB.我需要将它们连接在一起 但需要在磁盘上进行操作,因为我只有8GB内存. 我该怎么做?

I have two numpy arrays stored in hdf5 that are 44 GB each. I need to concatenate them together but need to do it on disk because I only have 8gb ram. How would I do this?

谢谢!

推荐答案

相关文章是在生成的文件中获取不同的数据集.在Python中,这是可能的,但是您将需要以多种操作读取和写入数据集.例如,从文件1读取1GB,写入输出文件,重复直到从文件1读取所有数据并对文件2进行相同操作.您需要直接在输出文件中声明适当最终大小的数据集

The related post is to obtain distinct datasets in the resulting file. In Python it is possible but you will need to read and write the datasets in multiple operations. Say, read 1GB from file 1, write to output file, repeat until all data is read from file 1 and do the same for file 2. You need to declare the dataset in the output file of the appropriate final size directly

d = f.create_dataset('name_of_dataset', shape=shape, dtype=dtype, data=None)

其中形状是根据数据集计算的,而dtype与数据集中的形状相匹配.

where shape is computed from the datasets and dtype matches the one from the datasets.

要写入d: d [i * N:(i + 1) N] = d_from_file_1 [i N:(i + 1)* N]

To write to d: d[i*N:(i+1)N] = d_from_file_1[iN:(i+1)*N]

这应该只将数据集部分加载到内存中.

This should only loads the datasets partially in memory.

这篇关于如何将两个numpy数组以hdf5格式连接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆