使用Astype在H5py中创建对HDF数据集的引用 [英] Creating reference to HDF dataset in H5py using astype
问题描述
从 h5py文档中,我看到了我可以使用astype
方法将HDF数据集转换为另一种类型.这将返回一个上下文管理器,它可以即时执行转换.
From the h5py docs, I see that I can cast a HDF dataset as another type using astype
method for the datasets. This returns a contextmanager which performs the conversion on-the-fly.
但是,我想读取存储为uint16
的数据集,然后将其转换为float32
类型.此后,我想以不同的功能从此数据集中提取各种切片,作为转换类型float32
.文档解释了用法
However, I would like to read in a dataset stored as uint16
and then cast it into float32
type. Thereafter, I would like to extract various slices from this dataset in a different function as the cast type float32
. The docs explains the use as
with dataset.astype('float32'):
castdata = dataset[:]
这将导致读取整个数据集并将其转换为float32
,这不是我想要的.我想引用该数据集,但将其转换为与numpy.astype
等效的float32
.如何创建对.astype('float32')
对象的引用,以便可以将其传递给另一个函数使用?
This would cause the entire dataset to be read in and converted to float32
, which is not what I want. I would like to have a reference to the dataset, but cast as a float32
equivalent to numpy.astype
. How do I create a reference to the .astype('float32')
object so that I can pass it to another function for use?
一个例子:
import h5py as HDF
import numpy as np
intdata = (100*np.random.random(10)).astype('uint16')
# create the HDF dataset
def get_dataset_as_float():
hf = HDF.File('data.h5', 'w')
d = hf.create_dataset('data', data=intdata)
print(d.dtype)
# uint16
with d.astype('float32'):
# This won't work since the context expires. Returns a uint16 dataset reference
return d
# this works but causes the entire dataset to be read & converted
# with d.astype('float32'):
# return d[:]
此外,似乎astype上下文仅在访问数据元素时才适用.这意味着
Furthermore, it seems like the astype context only applies when the data elements are accessed. This means that
def use_data():
d = get_data_as_float()
# this is a uint16 dataset
# try to use it as a float32
with d.astype('float32'):
print(np.max(d)) # --> output is uint16
print(np.max(d[:])) # --> output is float32, but entire data is loaded
那么,使用astype是否有一种麻木式的方式?
So is there not a numpy-esque way of using astype?
推荐答案
d.astype()
返回AstypeContext
对象.如果您查看AstypeContext
的来源,您将对发生的事情有更好的了解:
d.astype()
returns an AstypeContext
object. If you look at the source for AstypeContext
you'll get a better idea of what's going on:
class AstypeContext(object):
def __init__(self, dset, dtype):
self._dset = dset
self._dtype = numpy.dtype(dtype)
def __enter__(self):
self._dset._local.astype = self._dtype
def __exit__(self, *args):
self._dset._local.astype = None
输入AstypeContext
时,数据集的._local.astype
属性将更新为新的所需类型,而当您退出上下文时,它将变为原始值.
When you enter the AstypeContext
, the ._local.astype
attribute of your dataset gets updated to the new desired type, and when you exit the context it gets changed back to its original value.
因此,您可以或多或少地获得所需的行为,如下所示:
You can therefore get more or less the behaviour you're looking for like this:
def get_dataset_as_type(d, dtype='float32'):
# creates a new Dataset instance that points to the same HDF5 identifier
d_new = HDF.Dataset(d.id)
# set the ._local.astype attribute to the desired output type
d_new._local.astype = np.dtype(dtype)
return d_new
现在从d_new
中读取时,您将获得float32
numpy数组,而不是uint16
:
When you now read from d_new
, you will get float32
numpy arrays back rather than uint16
:
d = hf.create_dataset('data', data=intdata)
d_new = get_dataset_as_type(d, dtype='float32')
print(d[:])
# array([81, 65, 33, 22, 67, 57, 94, 63, 89, 68], dtype=uint16)
print(d_new[:])
# array([ 81., 65., 33., 22., 67., 57., 94., 63., 89., 68.], dtype=float32)
print(d.dtype, d_new.dtype)
# uint16, uint16
请注意,这不会更新d_new
的.dtype
属性(这似乎是不可变的).如果您还想更改dtype
属性,则可能需要将h5py.Dataset
子类化.
Note that this doesn't update the .dtype
attribute of d_new
(which seems to be immutable). If you also wanted to change the dtype
attribute, you'd probably need to subclass h5py.Dataset
in order to do so.
这篇关于使用Astype在H5py中创建对HDF数据集的引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!