使用Astype在H5py中创建对HDF数据集的引用 [英] Creating reference to HDF dataset in H5py using astype

查看:239
本文介绍了使用Astype在H5py中创建对HDF数据集的引用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

h5py文档中,我看到了我可以使用astype方法将HDF数据集转换为另一种类型.这将返回一个上下文管理器,它可以即时执行转换.

From the h5py docs, I see that I can cast a HDF dataset as another type using astype method for the datasets. This returns a contextmanager which performs the conversion on-the-fly.

但是,我想读取存储为uint16的数据集,然后将其转换为float32类型.此后,我想以不同的功能从此数据集中提取各种切片,作为转换类型float32.文档解释了用法

However, I would like to read in a dataset stored as uint16 and then cast it into float32 type. Thereafter, I would like to extract various slices from this dataset in a different function as the cast type float32. The docs explains the use as

with dataset.astype('float32'):
   castdata = dataset[:]

这将导致读取整个数据集并将其转换为float32,这不是我想要的.我想引用该数据集,但将其转换为与numpy.astype等效的float32.如何创建对.astype('float32')对象的引用,以便可以将其传递给另一个函数使用?

This would cause the entire dataset to be read in and converted to float32, which is not what I want. I would like to have a reference to the dataset, but cast as a float32 equivalent to numpy.astype. How do I create a reference to the .astype('float32') object so that I can pass it to another function for use?

一个例子:

import h5py as HDF
import numpy as np
intdata = (100*np.random.random(10)).astype('uint16')

# create the HDF dataset
def get_dataset_as_float():
    hf = HDF.File('data.h5', 'w')
    d = hf.create_dataset('data', data=intdata)
    print(d.dtype)
    # uint16

    with d.astype('float32'):
    # This won't work since the context expires. Returns a uint16 dataset reference
       return d

    # this works but causes the entire dataset to be read & converted
    # with d.astype('float32'):
    #   return d[:]

此外,似乎astype上下文仅在访问数据元素时才适用.这意味着

Furthermore, it seems like the astype context only applies when the data elements are accessed. This means that

def use_data():
   d = get_data_as_float()
   # this is a uint16 dataset

   # try to use it as a float32
   with d.astype('float32'):
       print(np.max(d))   # --> output is uint16
       print(np.max(d[:]))   # --> output is float32, but entire data is loaded

那么,使用astype是否有一种麻木式的方式?

So is there not a numpy-esque way of using astype?

推荐答案

d.astype()返回AstypeContext对象.如果您查看AstypeContext的来源,您将对发生的事情有更好的了解:

d.astype() returns an AstypeContext object. If you look at the source for AstypeContext you'll get a better idea of what's going on:

class AstypeContext(object):

    def __init__(self, dset, dtype):
        self._dset = dset
        self._dtype = numpy.dtype(dtype)

    def __enter__(self):
        self._dset._local.astype = self._dtype

    def __exit__(self, *args):
        self._dset._local.astype = None

输入AstypeContext时,数据集的._local.astype属性将更新为新的所需类型,而当您退出上下文时,它将变为原始值.

When you enter the AstypeContext, the ._local.astype attribute of your dataset gets updated to the new desired type, and when you exit the context it gets changed back to its original value.

因此,您可以或多或少地获得所需的行为,如下所示:

You can therefore get more or less the behaviour you're looking for like this:

def get_dataset_as_type(d, dtype='float32'):

    # creates a new Dataset instance that points to the same HDF5 identifier
    d_new = HDF.Dataset(d.id)

    # set the ._local.astype attribute to the desired output type
    d_new._local.astype = np.dtype(dtype)

    return d_new

现在从d_new中读取时,您将获得float32 numpy数组,而不是uint16:

When you now read from d_new, you will get float32 numpy arrays back rather than uint16:

d = hf.create_dataset('data', data=intdata)
d_new = get_dataset_as_type(d, dtype='float32')

print(d[:])
# array([81, 65, 33, 22, 67, 57, 94, 63, 89, 68], dtype=uint16)
print(d_new[:])
# array([ 81.,  65.,  33.,  22.,  67.,  57.,  94.,  63.,  89.,  68.], dtype=float32)

print(d.dtype, d_new.dtype)
# uint16, uint16

请注意,这不会更新d_new.dtype属性(这似乎是不可变的).如果您还想更改dtype属性,则可能需要将h5py.Dataset子类化.

Note that this doesn't update the .dtype attribute of d_new (which seems to be immutable). If you also wanted to change the dtype attribute, you'd probably need to subclass h5py.Dataset in order to do so.

这篇关于使用Astype在H5py中创建对HDF数据集的引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆