获取RuntimeError:无法使用多输入Keras模型创建链接(名称已经存在) [英] Getting the RuntimeError: Unable to create link (name already exists) with a multi-input Keras model
问题描述
我无法保存Keras模型,因为出现标题中提到的错误.我一直在使用tensorflow-gpu.我的模型由4个输入组成,每个输入都是一个ResNet50.当我仅使用一个输入时,下面的回叫效果很好,但是在使用多个输入时,出现以下错误:
RuntimeError:无法创建链接(名称已经存在)
callbacks = [EarlyStopping(monitor='val_loss', patience=30,mode='min', min_delta=0.0001, verbose=1),
ModelCheckpoint(checkpoint_path, monitor='val_loss',save_best_only=True, mode='min', verbose=1)
]
现在没有回调,由于出现相同的错误,我无法在训练结束时保存模型,但是我可以使用此代码上一个帖子相关. >
我已经阅读到tf-nightly
可以解决此问题,所以我尝试了一下,但是没有用.
我已经用独立代码和生成的数据进行了测试Google合作实验室,并且有效.所以我检查了tf版本,它与我的2.3.0
相同.至于cuda,colab和我的机器都在运行:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
这可能是问题吗?
更新:
此处显示输出错误:
113/113 [==============================] - ETA: 0s - loss: 30.0107 - mae: 1.3525
Epoch 00001: val_loss improved from inf to 0.18677, saving model to saved_models/multi_channel_model.h5
Traceback (most recent call last):
File "fine_tuning.py", line 111, in <module>
run()
File "fine_tuning.py", line 104, in run
model.fit(x=train_x_list, y=train_y, validation_split=0.2,
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
return method(self, *args, **kwargs)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1137, in fit
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 412, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 1249, in on_epoch_end
self._save_model(epoch=epoch, logs=logs)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 1301, in _save_model
self.model.save(filepath, overwrite=True, options=self._options)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1978, in save
save.save_model(self, filepath, overwrite, include_optimizer, save_format,
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 130, in save_model
hdf5_format.save_model_to_hdf5(
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 125, in save_model_to_hdf5
save_optimizer_weights_to_hdf5_group(f, model.optimizer)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 593, in save_optimizer_weights_to_hdf5_group
param_dset = weights_group.create_dataset(
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 139, in create_dataset
self[name] = dset
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 373, in __setitem__
h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 202, in h5py.h5o.link
RuntimeError: Unable to create link (name already exists)
-
尝试使用CUDA 10.1. https://www.tensorflow.org/install/gpu 说"TensorFlow支持CUDA® 10.1"
-
ModelCheckpoint
回调有问题.检查checkpoint_path位置是否可写?另外,参考文献还说:"如果save_best_only = True,则根据监视数量的最新最佳模型将不会被覆盖.".因此,您可能希望每次运行模型时都删除最后一个保护程序模型或在checkpoint_path中提供新的唯一名称.它很可能会防止覆盖以前的模型并引发错误.
I'm unable to save a Keras model as I get the error mentioned in the title. I have been using tensorflow-gpu. My model consists of 4 inputs each is a ResNet50. When I use only a single input the call back below worked perfectly, but with the multi inputs I'm getting the following error:
RuntimeError: Unable to create link (name already exists)
callbacks = [EarlyStopping(monitor='val_loss', patience=30,mode='min', min_delta=0.0001, verbose=1),
ModelCheckpoint(checkpoint_path, monitor='val_loss',save_best_only=True, mode='min', verbose=1)
]
Now without the callback I couldn't save the model at the end of training as I got the same error, but I was able to fix that using this code found here:
from tensorflow.python.keras import backend as K
with K.name_scope(model.optimizer.__class__.__name__):
for i, var in enumerate(model.optimizer.weights):
name = 'variable{}'.format(i)
model.optimizer.weights[i] = tf.Variable(var, name=name)
This code only works with single input model and put after the training function model.fit
.
With the callbacks even the above code is not working. This post is somehow related to my previous one.
I have read that this issue can be fixed with tf-nightly
so I tried that, but didn't work.
I have tested with a standalone code and generated data in a Google colab and it worked. So I checked the tf version, it's the same as mine 2.3.0
. As for cuda, both colab and my machine is running with :
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Could this be the issue?
Update:
Here the output error :
113/113 [==============================] - ETA: 0s - loss: 30.0107 - mae: 1.3525
Epoch 00001: val_loss improved from inf to 0.18677, saving model to saved_models/multi_channel_model.h5
Traceback (most recent call last):
File "fine_tuning.py", line 111, in <module>
run()
File "fine_tuning.py", line 104, in run
model.fit(x=train_x_list, y=train_y, validation_split=0.2,
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
return method(self, *args, **kwargs)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1137, in fit
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 412, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 1249, in on_epoch_end
self._save_model(epoch=epoch, logs=logs)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 1301, in _save_model
self.model.save(filepath, overwrite=True, options=self._options)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1978, in save
save.save_model(self, filepath, overwrite, include_optimizer, save_format,
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 130, in save_model
hdf5_format.save_model_to_hdf5(
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 125, in save_model_to_hdf5
save_optimizer_weights_to_hdf5_group(f, model.optimizer)
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 593, in save_optimizer_weights_to_hdf5_group
param_dset = weights_group.create_dataset(
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 139, in create_dataset
self[name] = dset
File "/home/abderrezzaq/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 373, in __setitem__
h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 202, in h5py.h5o.link
RuntimeError: Unable to create link (name already exists)
Try with CUDA 10.1. https://www.tensorflow.org/install/gpu says "TensorFlow supports CUDA® 10.1"
Something is wrong with
ModelCheckpoint
callback. Check checkpoint_path location Is it writeable? Also the reference says "if save_best_only=True, the latest best model according to the quantity monitored will not be overwritten." So you may want to delete the last saver model or provide new unique name in checkpoint_path every time you run model. Most likely it prevents overwriting the previous model and throws error.
这篇关于获取RuntimeError:无法使用多输入Keras模型创建链接(名称已经存在)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!