连接大量 HDF5 文件 [英] Concatenate a large number of HDF5 files

查看：36 发布时间：2022/1/21 12:43:03 dataset hdf5 scientific-computing

本文介绍了连接大量 HDF5 文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有大约 500 个 HDF5 文件，每个文件大约 1.5 GB.

I have about 500 HDF5 files each of about 1.5 GB.

每个文件都具有相同的精确结构，即 7 个复合(int、double、double)数据集和可变数量的样本.

Each of the files has the same exact structure, which is 7 compound (int,double,double) datasets and variable number of samples.

现在我想通过连接每个数据集来连接所有这些文件，这样最后我就有一个包含 7 个数据集的 750 GB 文件.

Now I want to concatenate all this files by concatenating each of the datasets so that at the end I have a single 750 GB file with my 7 datasets.

目前我正在运行一个 h5py 脚本:

Currently I am running a h5py script which:

创建一个具有无限最大值的正确数据集的 HDF5 文件
依次打开所有文件
检查样本数量是多少(因为它是可变的)
调整全局文件的大小
附加数据

这显然需要好几个小时，你有什么改进的建议吗?

this obviously takes many hours, would you have a suggestion about improving this?

我正在开发一个集群，所以我可以并行使用 HDF5，但是我在 C 编程方面还不够好，无法自己实现某些东西，我需要一个已经编写好的工具.

I am working on a cluster, so I could use HDF5 in parallel, but I am not good enough in C programming to implement something myself, I would need a tool already written.

连接大量 HDF5 文件 [英] Concatenate a large number of HDF5 files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

连接大量 HDF5 文件 [英] Concatenate a large number of HDF5 files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭