核心4D图像tif存储为hdf5 python [英] out of core 4D image tif storage as hdf5 python

查看:154
本文介绍了核心4D图像tif存储为hdf5 python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有27GB的2D Tiff文件,它们代表3D图像电影的片段。我希望能够像分割一个简单的numpy4d数组一样对这些数据进行切片。看起来dask.array是将数组作为hdf5文件存储在内存中后可以对其进行干净处理的好工具。

I have 27GB of 2D tiff files that represent slices of a movie of 3D images. I want to be able to slice this data as if it were a simple numpy4d array. It looks like dask.array is a good tool for cleanly manipulating the array once it's stored in memory as a hdf5 file.

如何将这些文件存储为hdf5文件首先,如果它们都不适合内存。我是h5.py和数据库的新手。

How can I store these files as an hdf5 file in the first place if they do not all fit into memory. I am new to h5.py and databases in general.

谢谢。

推荐答案

编辑:使用 dask.array imread 函数



dask 0.7.0 起,您无需将图像存储在HDF5中。直接使用 imread 函数:

Use dask.array's imread function

As of dask 0.7.0 you don't need to store your images in HDF5. Use the imread function directly instead:

In [1]: from skimage.io import imread

In [2]: im = imread('foo.1.tiff')

In [3]: im.shape
Out[3]: (5, 5, 3)

In [4]: ls foo.*.tiff
foo.1.tiff  foo.2.tiff  foo.3.tiff  foo.4.tiff

In [5]: from dask.array.image import imread

In [6]: im = imread('foo.*.tiff')

In [7]: im.shape
Out[7]: (4, 5, 5, 3)



将图像存储到HDF5中的旧答案



数据提取通常是最棘手的问题。 Dask.array没有与图像文件自动集成(尽管如果有足够的兴趣,这是完全可行的。)幸运的是,将数据移动到 h5py 很容易,因为 h5py 支持numpy切片语法。在下面的示例中,我们将创建一个空的h5py数据集,然后在for循环中将四个小的tiff文件存储到该数据集中。

Older answer that stores images into HDF5

Data ingest is often the trickiest of problems. Dask.array doesn't have any automatic integration with image files (though this is quite doable if there's sufficient interest.) Fortunately moving data to h5py is easy because h5py supports the numpy slicing syntax. In the following example we'll create an empty h5py Dataset, and then store four tiny tiff files into that dataset in a for loop.

首先,我们获取图像的文件名(请原谅玩具数据集。我周围没有任何现实的东西。)

First we get filenames for our images (please forgive the toy dataset. I don't have anything realistic lying around.)

In [1]: from glob import glob
In [2]: filenames = sorted(glob('foo.*.tiff'))
In [3]: filenames
Out[3]: ['foo.1.tiff', 'foo.2.tiff', 'foo.3.tiff', 'foo.4.tiff']

加载并检查示例图像

In [4]: from skimage.io import imread
In [5]: im = imread(filenames[0])  # a sample image
In [6]: im.shape  # tiny image
Out[6]: (5, 5, 3)
In [7]: im.dtype
Out[7]: dtype('int8')

现在,我们将在该文件中创建一个HDF5文件和一个名为'/ x'的HDF5数据集。

Now we'll make an HDF5 file and an HDF5 dataset called '/x' within that file.

In [8]: import h5py
In [9]: f = h5py.File('myfile.hdf5')  # make an hdf5 file
In [10]: out = f.require_dataset('/x', shape=(len(filenames), 5, 5, 3), dtype=im.dtype)

很好,现在我们可以一次将图像插入HDF5数据集。

Great, now we can insert our images one at a time into the HDF5 dataset.

In [11]: for i, fn in enumerate(filenames):
   ....:     im = imread(fn)
   ....:     out[i, :, :, :] = im

此时 dask.array 可以愉快地包装 out

In [12]: import dask.array as da
In [13]: x = da.from_array(out, chunks=(1, 5, 5, 3))  # treat each image as a single chunk
In [14]: x[::2, :, :, 0].mean()
Out[14]: dask.array<x_3, shape=(), chunks=(), dtype=float64>

如果您想获得更多对本机图像的支持,那么我建议您提出问题。直接从堆栈的tiff文件中使用 dask.array 而不用通过HDF5,将非常容易。

If you'd like to see more native support for stacks of images then I encourage you to raise an issue. It would be pretty easy to use dask.array off of your stack of tiff files directly without going through HDF5.

这篇关于核心4D图像tif存储为hdf5 python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆