Resnet50不收敛。 VGG16工作正常 [英] Resnet50 does not converge. VGG16 works fine

查看:583
本文介绍了Resnet50不收敛。 VGG16工作正常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 resnet50 作为骨干训练了一个回归网络。网络的输入是尺寸为 224 * 224 * 3 的图像,网络的输出一个值,范围从 0 1



但是无论我使用 Sigmoid relu 作为输出层的激活。 mae mse 作为损失功能



例如,我使用 resnet50 作为主干,使用 mae 作为损失函数,使用 Sigmoid >是输出层的激活功能。 SGD 作为优化程序。训练损失为:



第1阶段训练损失为0.4900,val_loss为0.4797



第2阶段训练损失为0.4923,val_loss为0.4794



第3阶段训练损失为0.4923,val_loss为0.4783



...



Epoch 35训练损失为0.4923,val_loss为0.4771



训练损失不会改变,恒定为0.4923。 val_loss始终约为0.47。我测试了不同的优化器学习率。网络仍然没有融合。



当我使用 VGG16 Mobilenet 作为骨干网时,网络会收敛。
谁能给我一些有关如何解决此问题的建议。

解决方案

您能以某种方式验证Resnet50骨干网是否正确实现。



在我看来,ResNet varaint只是输出一些平均值而不是学习实际问题,这也许可以尝试在MNIST上进行训练,看看它是否可以正常工作。





能否提供更多有关您要实现的目标的信息。回归的外观如何,以及主干需要什么输入。另外,您可能想看看类似的工作(如果存在),并阅读它们正在使用的体系结构以及超参数。


I trained one regression network using resnet50 as backbone. The input of the network is image whose size is 224*224*3, the output of the network is one value, varying from 0 to 1.

but the netwrok can not converge, no matter I use sigmoid or relu as output layer's activation. mae or mse as loss function.

For exampple, I use resnet50 as backbone,mae as loss function, sigmoid is the activation function of output layer. SGD as optimizer. The training loss would be:

Epoch 1 training loss is 0.4900, val_loss is 0.4797

Epoch 2 training loss is 0.4923, val_loss is 0.4794

Epoch 3 training loss is 0.4923, val_loss is 0.4783

...

Epoch 35 training loss is 0.4923, val_loss is 0.4771

The training loss would not change, it is constant 0.4923. the val_loss is always about 0.47. I tested differentoptimizer, learning rate. the network is still not converge.

When I use VGG16 or Mobilenet as backbone, the network converged. Could anyone give me some suggestions about how I can fix this problem.

解决方案

Can you somehow validate if the Resnet50 backbone is correctly implemented. Maybe try to train it on MNIST and see if it works in general.

It kinda seems to me that the ResNet varaint just outputs some mean value instead of learning the actual problem.

Can you give some more information on what you want to achieve. How your regression looks like and what input is expected from the backbone. Also you might want to have a look at similar work (if that exists) and read what architectures they were using and what hyperparameters.

这篇关于Resnet50不收敛。 VGG16工作正常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆