h5py,在SVHN中访问数据集中的数据 [英] h5py, access data in Datasets in SVHN
问题描述
我想通过使用h5py
I want to read the Street View House Numbers (SVHN) Dataset by using h5py
In [117]: def printname(name):
...: print(name)
...:
In [118]: data['/digitStruct'].visit(printname)
bbox
name
数据中有两个组,bbox
和name
,name
是对应于文件名数据的组名,而bbox
是对应于宽度,高度,顶部,左侧和标签数据.
There are two group in the data, bbox
and name
, name
is the group name corresponding to the file name data, and bbox
is the group name corresponding to the width, height, top, left and label data.
如何访问name
和bbox
组中的所有数据?
How can I visit all the data in name
and bbox
group?
我尝试使用文档中的以下代码,但是它只是返回HDF5对象引用.
I have tried with the following code from the Docs, but it just return HDF5 object reference.
In [119]: for i in data['/digitStruct/name']:
...: print(i[0])
...:
...:
<HDF5 object reference>
<HDF5 object reference>
<HDF5 object reference>
<HDF5 object reference>
<HDF5 object reference>
<HDF5 object reference>
Python版本:3.5和操作系统:Windows 10.
Python version: 3.5 and OS: Windows 10.
推荐答案
我将在这里回答我的问题,在阅读h5py
的文档后,这是我的代码
I'll answer my question here, after read the docs of h5py
, here is my code
def get_box_data(index, hdf5_data):
"""
get `left, top, width, height` of each picture
:param index:
:param hdf5_data:
:return:
"""
meta_data = dict()
meta_data['height'] = []
meta_data['label'] = []
meta_data['left'] = []
meta_data['top'] = []
meta_data['width'] = []
def print_attrs(name, obj):
vals = []
if obj.shape[0] == 1:
vals.append(obj[0][0])
else:
for k in range(obj.shape[0]):
vals.append(int(hdf5_data[obj[k][0]][0][0]))
meta_data[name] = vals
box = hdf5_data['/digitStruct/bbox'][index]
hdf5_data[box[0]].visititems(print_attrs)
return meta_data
def get_name(index, hdf5_data):
name = hdf5_data['/digitStruct/name']
return ''.join([chr(v[0]) for v in hdf5_data[name[index][0]].value])
hdf5_data
是train_data = h5py.File('./train/digitStruct.mat')
,可以正常工作!
以下是使用上述两个功能的示例代码
Here is some sample code to use the above two functions
mat_data = h5py.File(os.path.join(folder, 'digitStruct.mat'))
size = mat_data['/digitStruct/name'].size
for _i in tqdm.tqdm(range(size)):
pic = get_name(_i, mat_data)
box = get_box_data(_i, mat_data)
上面的函数显示了如何获取数据每个条目的名称和bbox数据!
The above function shows how to get the name and the bbox data of each entry of the data!
这篇关于h5py,在SVHN中访问数据集中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!