评估HDF5:HDF5为数据建模提供了哪些限制/特性? [英] Evaluating HDF5: What limitations/features does HDF5 provide for modelling data?

查看:339
本文介绍了评估HDF5:HDF5为数据建模提供了哪些限制/特性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在评估我们将用于存储在分析C / C ++代码期间收集的数据的技术。在C ++的情况下,数据量可以相对较大,每TU〜20Mb。

We are in evaluating technologies that we'll use to store data that we gather during the analysis of C/C++ code. In the case of C++, the amount of data can be relatively large, ~20Mb per TU.

读取以下SO answer 它让我认为 HDF5 可能是一种适合我们使用的技术。我想知道这里的人是否可以帮助我回答一些我最初的问题:

After reading the following SO answer it made me consider that HDF5 might be a suitable technology for us to use. I was wondering if people here could help me answer a few initial questions that I have:


  1. 性能。数据的一般用法将是写一次并读取几次,类似于由编译器生成的.o文件的生存期。 HDF5如何与使用类似SQLite DB的东西进行比较?这是否是一个合理的比较?

  1. Performance. The general usage for the data will be write once and read "several" times, similar to the lifetime of a '.o' file generated by a compiler. How does HDF5 compare against using something like an SQLite DB? Is that even a reasonable comparison to make?

随着时间的推移,我们将添加到我们存储的信息,但不一定要重新分发全新的读者集支持新格式。阅读用户指南后,我理解HDF5类似于XML或DB,因为信息与标签/列相关联,因此构建为读取旧结构的工具将忽略它不关心的字段?我的理解是否正确?

Over time we will add to the information that we are storing, but will not necessarily want to re-distribute a completely new set of "readers" to support a new format. After reading the user guide I understand that HDF5 is similar to XML or a DB, in that information is associated with a tag/column and so a tool built to read an older structure will just ignore the fields that it is not concerned with? Is my understanding on this correct?

我们希望写出的一大部分信息将是树型结构:范围层次结构,类型层次结构等等。理想情况下,我们将模型范围视为有父母,孩子等。是否可能有一个HDF5对象点到另一个?如果不是,有没有标准的技术来解决这个问题使用HDF5?或者,根据数据库中的需要,我们需要一个唯一键,在搜索数据时,通过适当的查找将一个对象链接到另一个对象。

A significant chunk of the information that we wish to write out will be a tree type of structure: scope hierarchy, type hierarchy etc. Ideally we would model scopes as having parents, children etc. Is it possible to have one HDF5 object "point" to another? If not, is there a standard technique to solve this problem using HDF5? Or, as is required in a DB, do we need a unique key that would "link" one object to another with appropriate lookups when searching for the data?

非常感谢!

推荐答案


HDF5如何与使用类似SQLite DB?
这是一个合理的比较吗?

How does HDF5 compare against using something like an SQLite DB? Is that even a reasonable comparison to make?

类似但不是真的排序。它们都是结构化文件。 SQLite具有支持使用SQL的数据库查询的功能。 HDF5具有支持大型科学数据集的功能。

Sort of similar but not really. They're both structured files. SQLite has features to support database queries using SQL. HDF5 has features to support large scientific datasets.

它们都意味着高性能。


随着时间的推移,我们将添加到我们存储的信息,但不一定要重新分发一组全新的读者来支持新的格式。

Over time we will add to the information that we are storing, but will not necessarily want to re-distribute a completely new set of "readers" to support a new format.

如果以结构化形式存储数据,那些结构的数据类型也存储在HDF5文件中。我有点生锈,如何工作(例如,如果它包括天生的向后兼容性),但我知道,如果你正确地设计你的阅读器,它应该能够处理类型,在未来改变。 p>

If you store data in structured form, the data types of those structures are also stored in the HDF5 file. I'm a bit rusty as to how this works (e.g. if it includes innate backwards compatibility), but I do know that if you design your "reader" correctly it should be able to handle types that are changed in the future.


是否可能有一个HDF5对象点到另一个?

Is it possible to have one HDF5 object "point" to another?

绝对!您需要使用属性。每个对象都有一个或多个字符串,用于描述到达该对象的路径。 HDF5 与文件夹/目录类似,但文件夹/目录是分层的=唯一路径描述每个人的位置(至少在文件系统中的硬链接),而组形成可以包括循环的有向图。我不知道你是否可以存储一个指针直接作为一个属性的对象,但你可以永远存储绝对/相对路径作为字符串属性。 (或任何其他地方的字符串;你可以有查询表如果你想要的)。

Absolutely! You'll want to use attributes. Each object has one or more strings describing the path to reach that object. HDF5 groups are analogous to folders/directories, except that folders/directories are hierarchical = a unique path describes each one's location (in filesystems w/o hard links at least), whereas groups form a directed graph which can include cycles. I'm not sure whether you can store a "pointer" to an object directly as an attribute, but you can always store an absolute/relative path as a string attribute. (or anywhere else as a string; you could have lookup tables galore if you wanted.)

这篇关于评估HDF5:HDF5为数据建模提供了哪些限制/特性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆