从HDF5文件删除信息 [英] Deleting information from an HDF5 file

查看:164
本文介绍了从HDF5文件删除信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我意识到某个SO用户以前曾问过这个问题但在2009年被问到,我希望能够获得更多关于HDF5的知识,或者较新的版本已解决了该特定问题.在这里重述有关我自己的问题的问题;

I realize that a SO user has formerly asked this question but it was asked in 2009 and I was hoping that more knowledge of HDF5 was available or newer versions had fixed this particular issue. To restate the question here concerning my own problem;

我有一个来自大型几何图形的巨大的节点和元素文件,并且已经从中检索了我需要的所有有用信息.因此,在Python中,我试图保留原始文件,但删除不需要的信息,并为其他来源填写更多信息.例如,我有一个不需要的节点数据集.但是,我需要保留相邻的数据集,并包括来自外部文件的有关其索引的信息.有什么方法可以删除这些特定的数据集?

I have a gigantic file of nodes and elements from a large geometry and have already retrieved all the useful information I need from it. Therefore, in Python, I am trying to keep the original file, but delete the information I do not need and fill in more information for other sources. For example, I have a dataset of nodes that I don't need. However, I need to keep the neighboring dataset and include information about their indices from an outside file. Is there any way to delete these specific datasets?

还是在HDF5文件中保留占位符"的旧想法仍然成立,以至于没人知道如何/其他人删除信息?我不太担心空白空间,只要简单地删除并添加信息然后创建一个全新的文件就可以了.

Or is the old idea of having "placekeepers" in the HDF5 file still holding true, such that no one knows how/bothers with removing info? I'm not too worried about the empty space, as long as it is faster to simply remove and add on information then to create an entirely new file.

注意:我正在使用H5py的'r +'进行读写.

Note: I'm using H5py's 'r+' to read and write.

推荐答案

从hdf5文件中删除整个节点(组或数据集)应该没问题.
但是,如果要回收空间,则必须运行h5repack工具.

Removing entire nodes (groups or datasets) from a hdf5 file should be no problem.
However if you want to reclaim the space you have to run the h5repack tool.

hdf5文档:

5.5.2.从文件中删除数据集并回收空间

5.5.2. Deleting a Dataset from a File and Reclaiming Space

HDF5目前不提供一种轻松的机制来删除 从文件中获取数据集或回收一个文件所占用的存储空间 删除的对象.

HDF5 does not at this time provide an easy mechanism to remove a dataset from a file or to reclaim the storage space occupied by a deleted object.

删除数据集并回收其使用的空间可以通过以下方式完成: H5Ldelete函数和h5repack实用程序.随着 H5Ldelete函数,可以从文件中删除到数据集的链接 结构体.删除所有链接后,数据集变为 任何应用程序都无法访问,并且已从 文件.恢复未链接数据集占用的空间的方法是 将文件的所有对象写入新文件.任何未链接 应用程序无法访问该对象,并且该对象不会包含在该对象中 新文件.可以通过自定义方式将对象写入新文件 程序或h5repack实用程序.

Removing a dataset and reclaiming the space it used can be done with the H5Ldelete function and the h5repack utility program. With the H5Ldelete function, links to a dataset can be removed from the file structure. After all the links have been removed, the dataset becomes inaccessible to any application and is effectively removed from the file. The way to recover the space occupied by an unlinked dataset is to write all of the objects of the file into a new file. Any unlinked object is inaccessible to the application and will not be included in the new file. Writing objects to a new file can be done with a custom program or with the h5repack utility program.

或者,您也可以查看PyTables的 ptrepack 工具. PyTables应该能够读取h5py hdf5文件,并且ptrepack工具类似于h5repack.

Alternatively you can also have a look into PyTables`s ptrepack tool. PyTables should be able to read h5py hdf5 files and the ptrepack tool is similar to the h5repack.

如果要从数据集中删除记录,则可能必须检索要保留的记录并创建一个新的数据集并删除旧的记录.
PyTables支持删除行,但是不建议这样做.

If you want to remove records from a datasets, then you probably have to retrieve the records you want to keep and create a new dataset and remove the old one.
PyTables supports removing rows, however it's not recommended.

这篇关于从HDF5文件删除信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆