如何将大型多维数组部分写入HDF5文件? [英] How can I write a large multidimensional array to an HDF5 file in parts?

查看:182
本文介绍了如何将大型多维数组部分写入HDF5文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C#中使用HDF5DotNet,并且有一个非常大的阵列(几个GB),我想将其写入HDF5文件.它太大了,无法将整个内容存储在内存中,因此我一次要生成它的各个区域并想将它们写出,但是当读出时,它仍然看起来像一个大数组.我知道HDF5可以做到这一点,但是.NET API的文档很少.

I'm using HDF5DotNet in C# and I have a very large array (several GB) that I want to write to an HDF5 file. It's too big to store the whole thing in memory, so I'm generating regions of it at a time and want to write them out, but still have it look like one big array when it's read back out. I know this is possible with HDF5 but the documentation for the .NET API is somewhat sparse.

我写了一些简短的示例代码,其中包含一个5 x 3的数组,其中填充了值1..15:

I wrote some short example code with a 5 x 3 array filled with values 1..15:

const int ROWS = 5;
const int COLS = 3;

static void Main(string[] args)
{
    WriteWholeArray();
    WriteArrayByRows();
    ushort[,] array = ReadWholeArray();
}

static void WriteWholeArray()
{
    H5FileId h5 = H5F.create(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.h5"), H5F.CreateMode.ACC_TRUNC);
    H5DataSpaceId dsi = H5S.create_simple(2, new long[] { ROWS, COLS });
    H5DataSetId dataset = H5D.create(h5, "array", new H5DataTypeId(H5T.H5Type.NATIVE_USHORT), dsi);
    ushort[,] array = new ushort[ROWS, COLS];
    ushort value = 1;
    for(int i = 0; i < array.GetLength(0); i++)
    {
        for (int j = 0; j < array.GetLength(1); j++)
        {
            array[i, j] = value++;
        }
    }
    H5D.write<ushort>(dataset, new H5DataTypeId(H5T.H5Type.NATIVE_USHORT), new H5Array<ushort>(array));
    H5D.close(dataset);
    H5F.close(h5);
}

static void WriteArrayByRows()
{
    H5FileId h5 = H5F.create(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.h5"), H5F.CreateMode.ACC_TRUNC);
    H5DataSpaceId dsi = H5S.create_simple(2, new long[] { ROWS, COLS });
    H5DataSetId dataset = H5D.create(h5, "array", new H5DataTypeId(H5T.H5Type.NATIVE_USHORT), dsi);
    ushort[,] array = new ushort[ROWS, COLS];
    ushort value = 1;
    for (int i = 0; i < array.GetLength(0); i++)
    {
        for (int j = 0; j < array.GetLength(1); j++)
        {
            array[i, j] = value++;
        }
    }
    for(int i = 0; i < array.GetLength(0); i++)
    {
        H5S.selectHyperslab(dsi, H5S.SelectOperator.SET, new long[] { i, 0 }, new long[] { 1, array.GetLength(1) });
        ushort[,] row = new ushort[1, array.GetLength(1)];
        for(int j = 0; j < array.GetLength(1); j++)
        {
            row[0, j] = array[i, j];
        }
        H5D.write<ushort>(dataset, new H5DataTypeId(H5T.H5Type.NATIVE_USHORT), new H5Array<ushort>(row));
    }
    H5D.close(dataset);
    H5F.close(h5);
}

static ushort[,] ReadWholeArray()
{
    H5FileId h5 = H5F.open(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.h5"), H5F.OpenMode.ACC_RDONLY);
    ushort[,] array = new ushort[ROWS, COLS];
    H5Array<ushort> h5_array = new H5Array<ushort>(array);
    H5DataSetId dataset = H5D.open(h5, "array");
    H5D.read<ushort>(dataset, new H5DataTypeId(H5T.H5Type.NATIVE_USHORT), h5_array);
    H5D.close(dataset);
    H5F.close(h5);
    return (array);
}

当我一次写入整个数组时,它会很好地读回.当我尝试按行写入时,我读回的数组具有一些正确的值(在错误的元素中),一些零和一些疯狂的值(例如43440).有人可以告诉我如何正确执行此操作吗?

When I write the whole array at once, it reads back in fine. When I try to write by rows, the array I read back in has some correct values (in the wrong elements), some zeroes, and some crazy values (e.g. 43440). Can somebody show me how to do this correctly?

推荐答案

我知道了.显然,当您写一个数组的超级平板时,您需要一个与您正在写的内存中的数组相对应的第二个数据空间.这是更正后的功能:

I figured it out. Apparently when you write hyperslabs of an array you need a second dataspace corresponding to the array in memory that you are writing. Here is the corrected function:

static void WriteArrayByRows()
{
    H5FileId h5 = H5F.create(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "test.h5"), H5F.CreateMode.ACC_TRUNC);
    H5DataSpaceId dsi = H5S.create_simple(2, new long[] { ROWS, COLS });
    H5DataSetId dataset = H5D.create(h5, "array", new H5DataTypeId(H5T.H5Type.NATIVE_USHORT), dsi);
    ushort[,] array = new ushort[ROWS, COLS];
    ushort value = 1;
    for (int i = 0; i < array.GetLength(0); i++)
    {
        for (int j = 0; j < array.GetLength(1); j++)
        {
            array[i, j] = value++;
        }
    }
    for (int i = 0; i < array.GetLength(0); i++)
    {
        H5S.selectHyperslab(dsi, H5S.SelectOperator.SET, new long[] { i, 0 }, new long[] { 1, array.GetLength(1) });
        H5DataSpaceId dsi2 = H5S.create_simple(2, new long[] { 1, array.GetLength(1) });  // added
        ushort[,] row = new ushort[1, array.GetLength(1)];
        for (int j = 0; j < array.GetLength(1); j++)
        {
            row[0, j] = array[i, j];
        }
        H5PropertyListId pli = new H5PropertyListId(H5P.Template.DEFAULT);  // added
        H5D.write<ushort>(dataset, new H5DataTypeId(H5T.H5Type.NATIVE_USHORT), dsi2, dsi, pli, new H5Array<ushort>(row));  // modified
    }
    H5D.close(dataset);
    H5F.close(h5);
}

我还发现分块对于在编写大型数组时获得不错的性能非常有用,这是一个示例:

I also found chunking to be very useful for getting decent performance writing my large array, here is an example of that:

H5PropertyListId pli = H5P.create(H5P.PropertyListClass.DATASET_CREATE);  // added
H5P.setChunk(pli, new long[] { 1, COLS });  // added
H5DataSetId dataset = H5D.create(h5, "array", new H5DataTypeId(H5T.H5Type.NATIVE_USHORT), dsi, H5P.create(H5P.PropertyListClass.LINK_CREATE), pli, H5P.create(H5P.PropertyListClass.DATASET_ACCESS));  // modified

这篇关于如何将大型多维数组部分写入HDF5文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆