Ada-Delta 方法在具有 MSE 损失的去噪 AutoEncoder 中使用时不收敛ReLU 激活? [英] Ada-Delta method doesn't converge when used in Denoising AutoEncoder with MSE loss & ReLU activation?

查看：59 发布时间：2021/11/30 19:44:40 machine-learning deep-learning autoencoder

本文介绍了Ada-Delta 方法在具有 MSE 损失的去噪 AutoEncoder 中使用时不收敛ReLU 激活?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刚刚为自己实现了 AdaDelta(http://arxiv.org/abs/1212.5701)深度神经网络库.该论文说带有 AdaDelta 的 SGD 对超参数不敏感，并且它总是收敛到好的地方.(至少 AdaDelta-SGD 的输出重建损失可以与经过良好调整的 Momentum 方法相媲美)

I just implemented AdaDelta (http://arxiv.org/abs/1212.5701) for my own Deep Neural Network Library. The paper kind of says that SGD with AdaDelta is not sensitive to hyperparameters, and that it always converge to somewhere good. (at least the output reconstruction loss of AdaDelta-SGD is comparable to that of well-tuned Momentum method)

当我在降噪 AutoEncoder 中使用 AdaDelta-SGD 作为学习方法时，它确实在某些特定设置中收敛，但并非总是如此.当我使用 MSE 作为损失函数，使用 Sigmoid 作为激活函数时，它收敛得非常快，经过 100 次迭代后，最终的重建损失优于所有普通 SGD、带有 Momentum 的 SGD 和 AdaGrad.

When I used AdaDelta-SGD as learning method in in Denoising AutoEncoder, it did converge in some specific settings, but not always. When I used MSE as loss function, and Sigmoid as activation function, it converged very quickly, and after iterations of 100 epochs, the final reconstruction loss was better than all of plain SGD, SGD with Momentum, and AdaGrad.

但是当我使用 ReLU 作为激活函数时，它没有收敛，而是继续堆叠(振荡)并具有高(坏)重建损失(就像您使用具有非常高学习率的普通 SGD 的情况一样).它叠加的重建损失的大小比动量法产生的最终重建损失高出约10到20倍.

But when I used ReLU as activation function, it didn't converge but continued to be stacked(oscillating) with high(bad) reconstruction loss (just like the case when you used plain SGD with very high learning rate). The magnitude of reconstruction loss it stacked was about 10 to 20 times higher than the final reconstruction loss generated with Momentum method.

我真的不明白为什么会发生这种情况，因为报纸说 AdaDelta 很好.请让我知道这些现象背后的原因，并教我如何避免它.

I really don't understand why it happened since the paper says AdaDelta is just good. Please let me know the reason behind the phenomena and teach me how I could avoid it.

Ada-Delta 方法在具有 MSE 损失的去噪 AutoEncoder 中使用时不收敛ReLU 激活? [英] Ada-Delta method doesn't converge when used in Denoising AutoEncoder with MSE loss & ReLU activation?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

Ada-Delta 方法在具有 MSE 损失的去噪 AutoEncoder 中使用时不收敛ReLU 激活? [英] Ada-Delta method doesn&#39;t converge when used in Denoising AutoEncoder with MSE loss &amp; ReLU activation?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

Ada-Delta 方法在具有 MSE 损失的去噪 AutoEncoder 中使用时不收敛ReLU 激活? [英] Ada-Delta method doesn't converge when used in Denoising AutoEncoder with MSE loss & ReLU activation?

登录关闭