如何在 Tensorflow 中正确设置 Adadelta 算法的参数? [英] How to set parameters of the Adadelta Algorithm in Tensorflow correctly?

查看:56
本文介绍了如何在 Tensorflow 中正确设置 Adadelta 算法的参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在将 Tensorflow 用于回归目的.我的神经网络非常小,只有 10 个输入神经元,单层中有 12 个隐藏神经元和 5 个输出神经元.

  • 激活函数是relu
  • 成本是产出与实际价值之间的平方距离
  • 我的神经网络与其他优化器(例如 GradientDescent、Adam、Adagrad)一起正确训练.

然而,当我尝试使用 Adadelta 时,神经网络根本无法训练.每一步的变量都保持不变.

我尝试了所有可能的初始学习率(从 1.0e-6 到 10)和不同的权重初始化:它总是相同的.

有人知道发生了什么吗?

非常感谢

解决方案

简短回答:不要使用 Adadelta

今天很少有人使用它,你应该坚持:

  • tf.train.MomentumOptimizer0.9 动量是非常标准的并且运行良好.缺点是您必须为自己找到最佳学习率.
  • tf.train.RMSPropOptimizer:结果不太依赖于良好的学习率.该算法与 Adadelta 非常相似,但在我看来表现更好.

如果您真的想使用 Adadelta,请使用论文中的参数:learning_rate=1., rho=0.95, epsilon=1e-6.更大的 epsilon 在一开始会有所帮助,但要准备好等待比其他优化器更长的时间才能看到收敛.

请注意,在论文中,他们甚至没有使用学习率,这与保持它等于 1 相同.

<小时>

长答案

Adadelta 启动非常缓慢.

问题在于它们累积更新的平方.

  • 在第 0 步,这些更新的运行平均值为零,因此第一次更新将非常小.
  • 由于第一次更新很小,一开始更新的运行平均值会很小,这在开始是一种恶性循环

我认为 Adadelta 在更大的网络上比你的表现更好,经过一些迭代后,它应该与 RMSProp 或 Adam 的性能相当.

<小时>

这是我使用 Adadelta 优化器的代码:

 将 tensorflow 导入为 tfv = tf.Variable(10.)损失 = v * v优化器 = tf.train.AdadeltaOptimizer(1., 0.95, 1e-6)train_op = optimizer.minimize(loss)accum = optimizer.get_slot(v, "accum") # 平方梯度的累加器accum_update = optimizer.get_slot(v, "accum_update") # 平方更新的累加器sess = tf.Session()sess.run(tf.initialize_all_variables())对于我在范围内(100):sess.run(train_op)打印 "%.3f \t %.3f \t %.6f" % 元组(sess.run([v, accum, accum_update]))

前 10 行:

 v accum accum_update9.994 20.000 0.0000019.988 38.975 0.0000029.983 56.979 0.0000039.978 74.061 0.0000049.973 90.270 0.0000059.968 105.648 0.0000069.963 120.237 0.0000069.958 134.077 0.0000079.953 147.205 0.0000089.948 159.658 0.000009

I've been using Tensorflow for regression purposes. My neural net is very small with 10 input neurons, 12 hidden neurons in a single layer and 5 output neurons.

  • activation function is relu
  • cost is square distance between output and real value
  • my neural net trains correctly with other optimizers such as GradientDescent, Adam, Adagrad.

However when I try to use Adadelta, the neural net simply won't train. Variables stay the same at every step.

I have tried with every initial learning_rate possible (from 1.0e-6 to 10) and with different weights initialization : it does always the same.

Does anyone have a slight idea of what is going on ?

Thanks so much

解决方案

Short answer: don't use Adadelta

Very few people use it today, you should instead stick to:

  • tf.train.MomentumOptimizer with 0.9 momentum is very standard and works well. The drawback is that you have to find yourself the best learning rate.
  • tf.train.RMSPropOptimizer: the results are less dependent on a good learning rate. This algorithm is very similar to Adadelta, but performs better in my opinion.

If you really want to use Adadelta, use the parameters from the paper: learning_rate=1., rho=0.95, epsilon=1e-6. A bigger epsilon will help at the start, but be prepared to wait a bit longer than with other optimizers to see convergence.

Note that in the paper, they don't even use a learning rate, which is the same as keeping it equal to 1.


Long answer

Adadelta has a very slow start. The full algorithm from the paper is:

The issue is that they accumulate the square of the updates.

  • At step 0, the running average of these updates is zero, so the first update will be very small.
  • As the first update is very small, the running average of the updates will be very small at the beginning, which is kind of a vicious circle at the beginning

I think Adadelta performs better with bigger networks than yours, and after some iterations it should equal the performance of RMSProp or Adam.


Here is my code to play a bit with the Adadelta optimizer:

import tensorflow as tf

v = tf.Variable(10.)
loss = v * v

optimizer = tf.train.AdadeltaOptimizer(1., 0.95, 1e-6)
train_op = optimizer.minimize(loss)

accum = optimizer.get_slot(v, "accum")  # accumulator of the square gradients
accum_update = optimizer.get_slot(v, "accum_update")  # accumulator of the square updates

sess = tf.Session()
sess.run(tf.initialize_all_variables())

for i in range(100):
    sess.run(train_op)
    print "%.3f \t %.3f \t %.6f" % tuple(sess.run([v, accum, accum_update]))

The first 10 lines:

  v       accum     accum_update
9.994    20.000      0.000001
9.988    38.975      0.000002
9.983    56.979      0.000003
9.978    74.061      0.000004
9.973    90.270      0.000005
9.968    105.648     0.000006
9.963    120.237     0.000006
9.958    134.077     0.000007
9.953    147.205     0.000008
9.948    159.658     0.000009

这篇关于如何在 Tensorflow 中正确设置 Adadelta 算法的参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆