将HDF5读入C ++并遇到内存问题 [英] Reading hdf5 into c++ with memory problems

查看:245
本文介绍了将HDF5读入C ++并遇到内存问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将我在python中开发的代码重写为c ++,主要是为了提高速度.同时也希望获得更多使用该语言的经验.我还计划使用openMP将这段代码并行化到共享204GB内存的48个内核上.

I am rewriting a code I had developed in python into c++ mainly for an improvement in speed; while also hoping to gain more experience in this language. I also plan on using openMP to parallelize this code onto 48 cores which share 204GB of memory.

我正在编写的程序很简单,我导入了一个3D的hdf5文件: A [T] [X] [E],其中T与模拟中的每个时间步关联,X表示测量场的位置,E(0:2)表示x,y,z中的电场. > A中的每个元素都是双精度,并且bin大小跨度为:A [15000] [80] [3].

The program I am writing is simple, I import an hdf5 file which is 3D : A[T][X][E], where T is associated to each timestep from a simulation, X represents where the field is measured, and E(0:2) represents the electric field in x,y,z.
Each element in A is a double, and the bin sizes span: A[15000][80][3].

我遇到的第一个麻烦是将这个大" h5文件输入一个数组,在继续之前需要专业意见.我的第一次尝试:

The first hiccup I have run into is inputting this 'large' h5 file into an array and would like a professional opinion before I continue. My first attempt:

...
#define RANK  3
#define DIM1  15001
#define DIM2  80
#define DIM3  3

using namespace std;
int main (void)
{
//  Define HDF5 variables for opening file. 
hid_t   file1, dataset1;
double bufnew[DIM1][DIM2][DIM3];
herr_t ret;
uint  i, j, k;

file1 = H5Fopen (FILE1, H5F_ACC_RDWR, H5P_DEFAULT);
dataset1 = H5Dopen (file1, "EFieldOnLine", H5P_DEFAULT);
ret = H5Dread (dataset1, H5T_NATIVE_DOUBLE, H5S_ALL, H5S_ALL,
                H5P_DEFAULT, bufnew);

cout << "Let's try dumping 0->100 elements" << endl;
for(i=1; i < 100; i++) cout << bufnew[i][20][2] << endl;
...

这会导致数组声明出现分段错误.我的下一步是使用3D数组(新)或3D向量.但是,我看到了很多反对这些方法的争论,更重要的是,我只需要E的一个分量,即我想将A [T] [X] [E]-> B [T] [X]重塑为例如,E的x分量.

which leads to a segmentation fault from array declaration. My next move was to use either a 3D array (new), or a 3D vector. However, I have seen much debate against these methods, and more importantly, I only need ONE component of the E, i.e. I would like to reshape A[T][X][E] -> B[T][X] for say, the x-component of E.

很抱歉,冗长的帖子,但是我想尽可能清楚,我想再次强调,我有兴趣学习如何编写最快,最高效的代码. 我感谢您的所有建议,时间和智慧.

Sorry for the lengthy post, but I wanted to be as clear as possible and would like to emphasize again that I am interested in learning how to write the fastest, and most efficient code. I appreciate all of your suggestions, time and wisdom.

推荐答案

将数组定义为局部变量意味着将其分配在堆栈上.堆栈通常只有几兆字节的限制,并且堆栈溢出肯定会导致段错误.大型数据结构应该动态地(使用new运算符)或静态地(定义为全局变量时)分配给堆.

Defining an array as a local variable means allocating it on stack. The stack is usually limited with several megabytes, and stack overflow surely leads to a segfault. Large data structures should be allocated at heap dynamically (using new operator) or statically (when defined as global variables).

对于这种尺寸,我不建议制作矢量的矢量.

I wouldn't advise to make a vector of vectors of vectors for such dimensions.

相反,创建一个一维数组来存储所有值

Instead, creating a one-dimensional array to store all values

double *bufnew = new double[DIM1*DIM2*DIM3];

并使用以下公式进行访问以计算给定3D物品的线性位置

and accessing it with the following formula to calculate linear position of a given 3D item

bufnew[(T*DIM2+X)*DIM3+E] = ... ; // bufnew[T][X][E]

应该可以.

这篇关于将HDF5读入C ++并遇到内存问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆