h5py随机无法打开对象(找不到组件) [英] h5py randomly unable to open object (component not found)

查看:242
本文介绍了h5py随机无法打开对象(找不到组件)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将hdf5数据集加载到pytorch训练中进行循环.

I'm trying to load hdf5 datasets into a pytorch training for loop.

无论数据加载器中有num_workers个,这都会随机引发"KeyError:'无法打开对象(找不到组件)'"(下面的回溯).

Regardless of num_workers in dataloader, this randomly throws "KeyError: 'Unable to open object (component not found)' " (traceback below).

我能够开始训练循环,但是如果没有这个错误,就无法通过一个纪元的1/4,这对于随机的数据集"(每个2darray)都会发生.我可以使用常规的 f ['group/subroup'] [()] 在控制台中分别加载这些数组,因此它不会像hdf文件已损坏或有任何东西出现数据集/数组有误.

I'm able to start the training loop, but not able to get through 1/4 of one epoch without this error which happens for random 'datasets' (which are 2darrays each). I'm able to separately load these arrays in the console using the regular f['group/subroup'][()] so it doesn't appear like the hdf file is corrupted or that there's anything wrong with the datasets/array.

我尝试过:

  • 根据人们使用pytorch遇到的其他各种问题来调整num_workers-仍然会发生0 num_workers的情况.
  • 升级/降级,炬管,numpy和python版本.
  • 在数据加载器 getitem
  • 的末尾使用f.close()
  • 使用新的conda env并安装依赖项.
  • 首先调用父组,然后初始化数组,例如: X = f [ID] ,然后 X = X [()]
  • 在hdf路径中使用双斜杠

因为这种情况以num_workers = 0重复出现,所以我认为这不是多线程问题,尽管回溯似乎指向/torch/utils/data/dataloader中准备下一批的行.

Because this recurs with num_workers=0, I figure it's not a multithreading issue although the traceback seems to point to lines from /torch/utils/data/dataloader that prep the next batch.

我只是想不出为什么h5py无法随机看到奇数个单独的数据集.

I just can't figure out why h5py can't see the odd individual dataset, randomly.

ID是匹配hdf路径的字符串,例如: ID ="ID_12345//Ep_-1//AN_67891011//ABC"

IDs are strings to match hdf paths eg: ID = "ID_12345//Ep_-1//AN_67891011//ABC"

摘录自数据加载器:

def __getitem__(self, index):

    ID = self.list_IDs[index]

    # Start hdf file in read mode:
    f = h5py.File(self.hdf_file, 'r', libver='latest', swmr=True)

    X = f[ID][()]

    X = X[:, :, np.newaxis] # torchvision 0.2.1 needs (H x W x C) for transforms

    y = self.y_list[index]

    if self.transform:
        X = self.transform(X)

    return ID, X, y

`

预期:训练循环

实际:ID/数据集/示例最初加载良好,然后经过20到200个步骤...

Actual: IDs / datasets / examples are loaded fine initially, then after between 20 and 200 steps...

回溯(最近通话最近一次):

Traceback (most recent call last):

文件"Documents/BSSA-loc/mamdl/models/main_v3.py",第287行,在main()main中的文件"Documents/BSSA-loc/mamdl/models/main_v3.py",第203行对于i,(枚举(train_loader)中的(ID,图像,标签):文件"/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/torch/utils/data/dataloader.py",第615行,下一个批处理= self.collat​​e_fn([i中的索引为self.dataset [i]])文件"/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/torch/utils/data/dataloader.py",615行,在批处理= self.collat​​e_fn([i中的索引为self.dataset [i]])文件"/home/james/Documents/BSSA-loc/mamdl/src/data_loading/Data_loader_v3.py",第59行,在 getitem X = f [ID] [()]文件"h5py/_objects.pyx",第54行,位于h5py._objects.with_phil.wrapper中,文件"h5py/_objects.pyx",第55行,在h5py._objects.with_phil.wrapper文件中"/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/h5py/_hl/group.py",第262行,在 getitem oid = h5o.open(self.id,self._e(name),lapl = self._lapl)文件"h5py/_objects.pyx",第54行,位于h5py._objects.with_phil.wrapper
在h5py._objects.with_phil.wrapper中的文件"h5py/_objects.pyx",第55行在h5py.h5o.open中将文件"h5py/h5o.pyx"插入第190行

File "Documents/BSSA-loc/mamdl/models/main_v3.py", line 287, in main() File "Documents/BSSA-loc/mamdl/models/main_v3.py", line 203, in main for i, (IDs, images, labels) in enumerate(train_loader): File "/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 615, in next batch = self.collate_fn([self.dataset[i] for i in indices]) File "/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 615, in batch = self.collate_fn([self.dataset[i] for i in indices]) File "/home/james/Documents/BSSA-loc/mamdl/src/data_loading/Data_loader_v3.py", line 59, in getitem X = f[ID][()] File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/home/james/anaconda3/envs/jc/lib/python3.7/site-packages/h5py/_hl/group.py", line 262, in getitem oid = h5o.open(self.id, self._e(name), lapl=self._lapl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5o.pyx", line 190, in h5py.h5o.open

KeyError:'无法打开对象(未找到组件)'

KeyError: 'Unable to open object (component not found)'

推荐答案

记录下来,我最好的猜测是这是由于我的hdf构造代码中的一个错误,该错误在附加模式下已停止并多次启动.当查询 f ['group/subroup'] [()] 时,某些数据集看起来好像是完整的,但是无法使用pytorch数据加载器加载.

For the record, my best guess is that this was due a bug in my code for hdf construction, which was stopped and started multiple times in append mode. Some datasets appeared as though they were complete when queried f['group/subroup'][()] but were not able to loaded with pytorch dataloader.

自从以不同方式重建hdf以来,就没有这个问题.

Haven't had this issue since rebuilding hdf differently.

这篇关于h5py随机无法打开对象(找不到组件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆