Ada-Delta 方法在具有 MSE 损失的去噪 AutoEncoder 中使用时不收敛ReLU 激活? [英] Ada-Delta method doesn't converge when used in Denoising AutoEncoder with MSE loss & ReLU activation?

查看:59
本文介绍了Ada-Delta 方法在具有 MSE 损失的去噪 AutoEncoder 中使用时不收敛ReLU 激活?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚为自己实现了 AdaDelta(http://arxiv.org/abs/1212.5701)深度神经网络库.该论文说带有 AdaDelta 的 SGD 对超参数不敏感,并且它总是收敛到好的地方.(至少 AdaDelta-SGD 的输出重建损失可以与经过良好调整的 Momentum 方法相媲美)

I just implemented AdaDelta (http://arxiv.org/abs/1212.5701) for my own Deep Neural Network Library. The paper kind of says that SGD with AdaDelta is not sensitive to hyperparameters, and that it always converge to somewhere good. (at least the output reconstruction loss of AdaDelta-SGD is comparable to that of well-tuned Momentum method)

当我在降噪 AutoEncoder 中使用 AdaDelta-SGD 作为学习方法时,它确实在某些特定设置中收敛,但并非总是如此.当我使用 MSE 作为损失函数,使用 Sigmoid 作为激活函数时,它收敛得非常快,经过 100 次迭代后,最终的重建损失优于所有普通 SGD、带有 Momentum 的 SGD 和 AdaGrad.

When I used AdaDelta-SGD as learning method in in Denoising AutoEncoder, it did converge in some specific settings, but not always. When I used MSE as loss function, and Sigmoid as activation function, it converged very quickly, and after iterations of 100 epochs, the final reconstruction loss was better than all of plain SGD, SGD with Momentum, and AdaGrad.

但是当我使用 ReLU 作为激活函数时,它没有收敛,而是继续堆叠(振荡)并具有高(坏)重建损失(就像您使用具有非常高学习率的普通 SGD 的情况一样).它叠加的重建损失的大小比动量法产生的最终重建损失高出约10到20倍.

But when I used ReLU as activation function, it didn't converge but continued to be stacked(oscillating) with high(bad) reconstruction loss (just like the case when you used plain SGD with very high learning rate). The magnitude of reconstruction loss it stacked was about 10 to 20 times higher than the final reconstruction loss generated with Momentum method.

我真的不明白为什么会发生这种情况,因为报纸说 AdaDelta 很好.请让我知道这些现象背后的原因,并教我如何避免它.

I really don't understand why it happened since the paper says AdaDelta is just good. Please let me know the reason behind the phenomena and teach me how I could avoid it.

推荐答案

ReLU 的激活是无界的,这使得它在自动编码器中的使用变得困难,因为您的训练向量可能没有任意大且无界的响应!ReLU 根本不适合这种类型的网络.

The activation of a ReLU is unbounded, making its use in Auto Encoders difficult since your training vectors likely do not have arbitrarily large and unbounded responses! ReLU simply isn't a good fit for that type of network.

您可以通过对输出层应用一些转换来强制将 ReLU 转换为自动编码器,就像 在这里完成.但是,他们不会从自动编码器的角度讨论结果的质量,而只是作为分类的预训练方法.因此,尚不清楚构建自动编码器是否值得努力.

You can force a ReLU into an auto encoder by applying some transformation to the output layer, as is done here. However, hey don't discuss the quality of the results in terms of an auto-encoder, but instead only as a pre-training method for classification. So its not clear that its a worth while endeavor for building an auto encoder either.

这篇关于Ada-Delta 方法在具有 MSE 损失的去噪 AutoEncoder 中使用时不收敛ReLU 激活?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆