搜索 HDF5 数据集 [英] Searching a HDF5 dataset

查看:41
本文介绍了搜索 HDF5 数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在探索 HDF5.我已经阅读了Evaluating HDF5"主题中的有趣评论,我知道 HDF5 是存储数据的首选解决方案,但是你如何查询呢?例如,假设我有一个包含一些标识符的大文件:有没有办法快速知道文件中是否存在给定的标识符?

I'm currently exploring HDF5. I've read the interesting comments from the thread "Evaluating HDF5" and I understand that HDF5 is a solution of choice for storing the data, but how do you query it ? For example, say I've a big file containing some identifiers : Is there a way to quickly know if a given identifier is present in the file ?

推荐答案

我认为答案是不直接".

I think the answer is "not directly".

以下是我认为您可以实现该功能的一些方法.

Here are some of the ways I think you could achieve the functionality.

使用群组:

可以以基数树的形式使用组的层次结构来存储数据.不过,这可能无法很好地扩展.

A hierarchy of groups could be used in the form of a Radix Tree to store the data. This probably doesn't scale too well though.

使用索引数据集:

HDF 有一个引用类型,可用于从单独的索引表链接到主表.写入主数据后,可以使用其他带有引用的键排序的其他数据集.例如:

HDF has a reference type which could be used to link to a main table from a separate index tables. After writing the main data, other datasets sorted on other keys with references can be used. For example:

MainDataset (sorted on identifier)
0: { A, "C", 2 }
1: { B, "B", 1 }
2: { C, "A", 3 }

StringIndex
0: { "A", Reference ("MainDataset", 2) }
1: { "B", Reference ("MainDataset", 1) }
2: { "C", Reference ("MainDataset", 0) }

IntIndex
0: { 1, Reference ("MainDataset", 1) }
1: { 2, Reference ("MainDataset", 0) }
2: { 3, Reference ("MainDataset", 2) }

为了使用上述内容,在索引表中查找字段时必须编写二进制搜索.

In order to use the above a binary search will have to be written when looking up the field in the Index tables.

内存索引:

根据数据集的大小,使用诸如boost::serialize"之类的东西读取/写入其自己的数据集的内存索引可能同样容易.

Depending on the size of the dataset it may be just as easy to use an in memory index that is read/written to its own dataset using something like "boost::serialize".

HDF5-FastQuery:

这篇论文(还有这篇page) 描述了使用位图索引对 HDF 数据集执行复杂查询.这个我没试过.

This paper (and also this page) describe the use of bitmap indices to perform complex queries over a HDF dataset. I have not tried this.

这篇关于搜索 HDF5 数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆