使用h5py在python中读取HDF5格式的MATLAB文件 [英] reading HDF5-format MATLAB file in python with h5py

查看:1044
本文介绍了使用h5py在python中读取HDF5格式的MATLAB文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用h5py库以python格式读取HDF5格式的MATLAB文件.该文件称为"Q_visSDF_accurate.mat",并具有两个键:文件名"和"sdf". 文件名包含一个单元格数组字符串."sdf"是一个包含浮点数的[6001,49380]矩阵.我可以使用以下代码提取变量sdf没问题:

I am trying to read a HDF5-format MATLAB file in python, using the h5py library. This file is called "Q_visSDF_accurate.mat" and has two keys: "filename" and "sdf". "filename contains a cell array strings. "sdf" is a [6001, 49380] matrix containing floats. I had no problem to extract the variable sdf using the following code:

import h5py
data = h5py.File("Q_visSDF_accurate.mat", 'r')
sdf = data.get("sdf")[:,:]
sdf = sdf.astype(float)

但是,我无法读取文件名变量.我试过了:

However, I cant read the filename variable. I tried:

filename = data.get("filename")[0]

但是代码返回:

array([<HDF5 object reference>, <HDF5 object reference>,
   <HDF5 object reference>, ..., <HDF5 object reference>,
   <HDF5 object reference>, <HDF5 object reference>], dtype=object)

我可以取消引用文件名变量的包含内容吗?使用hdf5storage软件包不是解决方案,因为它仅适用于python 32位,并且只能读取matlab变量的子集.

I can I de-reference the containt of the filename variable? Using the hdf5storage package is not a solution, as it works only for python 32 bits and can only read a subset of matlab variables.

推荐答案

在Octave中,我创建了一个包含单元格和矩阵的文件

In Octave I created a file with cell and matrix

>> xmat = [1,2,3;4,5,6;7,8,9];
>> xcell = {1,2,3;4,5,6;7,8,9};
>> save -hdf5 testmat.h5 xmat xcell

ipythonh5py中,我发现此文件包含2个组

In ipython with h5py, I find that this file contains 2 groups

In [283]: F = h5py.File('../testmat.h5','r')
In [284]: list(F.keys())
Out[284]: ['xcell', 'xmat']

矩阵组具有typevalue数据集:

In [285]: F['xmat']
Out[285]: <HDF5 group "/xmat" (2 members)>
In [286]: list(F['xmat'].keys())
Out[286]: ['type', 'value']
In [287]: F['xmat']['type']
Out[287]: <HDF5 dataset "type": shape (), type "|S7">
In [288]: F['xmat']['value']
Out[288]: <HDF5 dataset "value": shape (3, 3), type "<f8">
In [289]: F['xmat']['value'][:]
Out[289]: 
array([[ 1.,  4.,  7.],
       [ 2.,  5.,  8.],
       [ 3.,  6.,  9.]])

单元格具有相同的typevalue,但value是另一个组:

The cell has the same type and value, but value is another group:

In [291]: F['xcell']['type']
Out[291]: <HDF5 dataset "type": shape (), type "|S5">
In [292]: F['xcell']['value']
Out[292]: <HDF5 group "/xcell/value" (10 members)>

In [294]: list(F['xcell']['value'].keys())
Out[294]: ['_0', '_1', '_2', '_3', '_4', '_5', '_6', '_7', '_8', 'dims']
...
In [296]: F['xcell']['value']['dims'][:]
Out[296]: array([3, 3])

我必须使用[...]来获取单元格的值,因为它是一个0d数组:

I had to use the [...] to fetch the value of a cell, since it is a 0d array:

In [301]: F['xcell']['value']['_0']['value'][...]
Out[301]: array(1.0)

要真正重复这个问题,我应该创建字符串单元格值,但是我认为这很好地说明了单元格的存储方式-作为数据组内的命名数据集.

To really replicate the question I should have created string cells values, but I think this illustrates well enough how a cells are stored - as named datasets within a data group.

我假设Octave h5存储与MATLAB兼容.

I'm assuming the Octave h5 storage is compatible with MATLAB's.

这篇关于使用h5py在python中读取HDF5格式的MATLAB文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆