只能使用TensorFlow中处理梯度的代码示例来实现像优化器这样的梯度下降吗? [英] Can one only implement gradient descent like optimizers with the code example from processing gradients in TensorFlow?

查看:98
本文介绍了只能使用TensorFlow中处理梯度的代码示例来实现像优化器这样的梯度下降吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在查看处理TensorFlow具有的渐变的示例代码:

# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)

# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)

# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]

# Ask the optimizer to apply the capped gradients.
opt.apply_gradients(capped_grads_and_vars)

但是,我注意到apply_gradients函数是从GradientDescentOptimizer派生的.这是否意味着使用上面的示例代码,一个人只能实现类似下降规则的渐变(注意,我们可以更改opt = GradientDescentOptimizerAdam或其他任何优化器)?特别是apply_gradients的作用是什么?我确定地检查 tf github页面中的代码但这是一堆与数学表达式无关的python,因此很难说出它在做什么以及它在优化器之间的变化.

例如,如果我想实现自己的自定义优化器,该优化器可能使用渐变(或者可能不会例如仅使用某些规则(可能是生物学上更合理的规则)直接更改权重),那么上述示例代码无法实现吗? /p>


我特别想实现一个梯度下降版本,该版本在紧凑域中被人为限制.我特别想实现以下等式:

w := (w - mu*grad + eps) mod B

在TensorFlow中为

.我意识到以下事实是正确的:

w := w mod B - mu*grad mod B + eps mod B

所以我认为我可以通过以下方式实现它:

def Process_grads(g,mu_noise,stddev_noise,B):
    return (g+tf.random_normal(tf.shape(g),mean=mu_noise,stddev=stddev_noise) ) % B

然后只有:

processed_grads_and_vars = [(Process_grads(gv[0]), gv[1]) for gv in grads_and_vars]
# Ask the optimizer to apply the processed gradients.
opt.apply_gradients(processed_grads_and_vars)

但是,我意识到这还不够,因为我实际上没有访问w的权限,因此我无法实现:

w mod B

至少不是我尝试的方式.有没有办法做到这一点?即实际上直接更改更新规则?至少我尝试过的方式?

我知道它的更新规则很棘手,但我的意思是要改变更新方程,而不是真正关心该更新规则(因此,如果它有点怪异,请不要挂在嘴上). /p>


我想出了超级hacky解决方案:

def manual_update_GDL(arg,learning_rate,g,mu_noise,stddev_noise):
    with tf.variable_scope(arg.mdl_scope_name,reuse=True):
        W_var = tf.get_variable(name='W')
        eps = tf.random_normal(tf.shape(g),mean=mu_noise,stddev=stddev_noise)
        #
        W_new = tf.mod( W_var - learning_rate*g + eps , 20)
        sess.run( W_var.assign(W_new) )

def manual_GDL(arg,loss,learning_rate,mu_noise,stddev_noise,compact,B):
    # Compute the gradients for a list of variables.
    grads_and_vars = opt.compute_gradients(loss)
    # process gradients
    processed_grads_and_vars = [(manual_update_GDL(arg,learning_rate,g,mu_noise,stddev_noise), v) for g,v in grads_and_vars]

不确定是否可以正常运行,但类似的东西通常应该可以正常工作.想法是只写下要使用的方程式(在TensorFlow中使用 ),然后使用会话手动更新权重.

不幸的是,这样的解决方案意味着我们必须注意退火(手动降低学习速度,这似乎很烦人).此解决方案可能还有许多其他问题,请随时指出(并尽可能提供解决方案).


对于这个非常简单的问题,我意识到可以只执行常规的优化程序更新规则,然后取权重的mod并将其重新分配为其值:

sess.run(fetches=train_step)
if arg.compact:
    # apply w := ( w - mu*g + eps ) mod B
    W_val = W_var.eval()
    W_new = tf.mod(W_var,arg.B).eval()
    W_var.assign(W_new).eval()

但在这种情况下,碰巧是存在这样一个简单的解决方案(不幸的是,绕过了我的问题的全部内容).

实际上,此解决方案会大大降低代码速度.目前是我所拥有的最好的.


作为参考,我已经看到了以下问题:如何创建优化器在Tensorflow 中,但没有发现它直接回答了我的问题.

解决方案

您的解决方案会降低代码速度,因为在创建"train_step"时会使用sess.run.eval()代码.相反,您应该仅使用内部tensorflow函数(不使用sess.run.eval())创建train_step图.此后,您只能循环评估train_step.

如果您不想使用任何标准的优化器,则可以编写自己的应用梯度"图.这是一种可能的解决方案:

learning_rate = tf.Variable(tf.constant(0.1))
mu_noise = 0.
stddev_noise = 0.01

#add all your W variables here when you have more than one:
train_w_vars_list = [W]
grad = tf.gradients(some_loss, train_w_vars_list)

assign_list = []
for g, v in zip(grad, train_w_vars_list):
  eps = tf.random_normal(tf.shape(g), mean=mu_noise, stddev=stddev_noise)
  assign_list.append(v.assign(tf.mod(v - learning_rate*g + eps, 20)))

#also update the learning rate here if you want to:
assign_list.append(learning_rate.assign(learning_rate - 0.001))

train_step = tf.group(*assign_list)

您还可以使用标准优化程序之一来创建grads_and_vars列表(然后使用它代替zip(grad,train_w_vars_list)).

这是MNIST遭受损失的一个简单例子:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow.examples.tutorials.mnist import input_data

import tensorflow as tf

# Import data
mnist = input_data.read_data_sets('PATH TO MNIST_data', one_hot=True)

# Create the model
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
y = tf.matmul(x, W)


# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10])

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

learning_rate = tf.Variable(tf.constant(0.1))
mu_noise = 0.
stddev_noise = 0.01

#add all your W variables here when you have more than one:
train_w_vars_list = [W]
grad = tf.gradients(cross_entropy, train_w_vars_list)

assign_list = []
for g, v in zip(grad, train_w_vars_list):
  eps = tf.random_normal(tf.shape(g), mean=mu_noise, stddev=stddev_noise)
  assign_list.append(v.assign(tf.mod(v - learning_rate*g + eps, 20)))

#also update the learning rate here if you want to:
assign_list.append(learning_rate.assign(learning_rate - 0.001))

train_step = tf.group(*assign_list)


sess = tf.InteractiveSession()
tf.global_variables_initializer().run()


# Train
for _ in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})


# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                    y_: mnist.test.labels}))

I was looking at the example code for processing gradients that TensorFlow has:

# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)

# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)

# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]

# Ask the optimizer to apply the capped gradients.
opt.apply_gradients(capped_grads_and_vars)

however, I noticed that the apply_gradients function was derived from the GradientDescentOptimizer. Does that mean that using the example code from above, one can only implement gradient like descent rules (notice we could change the opt = GradientDescentOptimizer or Adam or any of the the other optimizers)? In particular, what does apply_gradients do? I definitively check the code in the tf github page but it was a bunch of python that had nothing to do with mathematical expressions, so it was hard to tell what that was doing and how it changed from optimizer to optimizer.

For example, if I wanted to implement my own custom optimizer that might use gradients (or might not e.g. just change the weights directly with some rule, maybe more biologically plausible rule), its not possible with the above example code?


In particular I wanted to implement a gradient descent version that is artificially restricted in a compact domain. In particular I wanted to implement the following equation:

w := (w - mu*grad + eps) mod B

in TensorFlow. I realized that the following is true:

w := w mod B - mu*grad mod B + eps mod B

so I thought that I could just implement it by doing:

def Process_grads(g,mu_noise,stddev_noise,B):
    return (g+tf.random_normal(tf.shape(g),mean=mu_noise,stddev=stddev_noise) ) % B

and then just having:

processed_grads_and_vars = [(Process_grads(gv[0]), gv[1]) for gv in grads_and_vars]
# Ask the optimizer to apply the processed gradients.
opt.apply_gradients(processed_grads_and_vars)

however, I realized that that wasn't good enough because I don't actually have access to w so I can't implement:

w mod B

at least not the way I tried. Is there a way to do this? i.e. to actually directly change the update rule? At least the way I tried?

I know its sort of a hacky update rule, but my point is more to change the update equation than actually caring to much about that update rule (so don't get hung up on it if its a bit weird).


I came up with super hacky solution:

def manual_update_GDL(arg,learning_rate,g,mu_noise,stddev_noise):
    with tf.variable_scope(arg.mdl_scope_name,reuse=True):
        W_var = tf.get_variable(name='W')
        eps = tf.random_normal(tf.shape(g),mean=mu_noise,stddev=stddev_noise)
        #
        W_new = tf.mod( W_var - learning_rate*g + eps , 20)
        sess.run( W_var.assign(W_new) )

def manual_GDL(arg,loss,learning_rate,mu_noise,stddev_noise,compact,B):
    # Compute the gradients for a list of variables.
    grads_and_vars = opt.compute_gradients(loss)
    # process gradients
    processed_grads_and_vars = [(manual_update_GDL(arg,learning_rate,g,mu_noise,stddev_noise), v) for g,v in grads_and_vars]

not sure if it works but something like that should work in general. The idea is to just write down the equation one wants to use (in TensorFlow) for the learning rate and then update the weights manually using a session.

Unfortunately, such a solution means we have to take care of the annealing (decaying learning rate manually which seems annoying). This solution probably has many other problems, feel free to point them out (and give solutions if you can).


For this very simple problem I realized one can just do the normal optimizer update rule and then just take the mod of the weights and re-assign them to their value:

sess.run(fetches=train_step)
if arg.compact:
    # apply w := ( w - mu*g + eps ) mod B
    W_val = W_var.eval()
    W_new = tf.mod(W_var,arg.B).eval()
    W_var.assign(W_new).eval()

but in this case its a coincidence that such a simple solution exists (unfortunately, bypasses the whole point of my question).

Actually, this solutions slows down the code a lot. For the moment is the best that I've got.


As a reference, I have seen this question: How to create an optimizer in Tensorflow , but didn't find it responded directly to my question.

解决方案

Your solution slows down the code because you use the sess.run and .eval() code during your "train_step" creation. Instead you should create the train_step graph using only internal tensorflow functions (without using sess.run and .eval()). Thereafter you only evaluate the train_step in a loop.

If you don't want to use any standard optimizer you can write your own "apply gradient" graph. Here is one possible solution for that:

learning_rate = tf.Variable(tf.constant(0.1))
mu_noise = 0.
stddev_noise = 0.01

#add all your W variables here when you have more than one:
train_w_vars_list = [W]
grad = tf.gradients(some_loss, train_w_vars_list)

assign_list = []
for g, v in zip(grad, train_w_vars_list):
  eps = tf.random_normal(tf.shape(g), mean=mu_noise, stddev=stddev_noise)
  assign_list.append(v.assign(tf.mod(v - learning_rate*g + eps, 20)))

#also update the learning rate here if you want to:
assign_list.append(learning_rate.assign(learning_rate - 0.001))

train_step = tf.group(*assign_list)

You can also use one of the standard optimizer to create the grads_and_vars list (use it instead of zip(grad, train_w_vars_list) then).

Here is a simple example for MNIST with your loss:

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorflow.examples.tutorials.mnist import input_data

import tensorflow as tf

# Import data
mnist = input_data.read_data_sets('PATH TO MNIST_data', one_hot=True)

# Create the model
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
y = tf.matmul(x, W)


# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10])

cross_entropy = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))

learning_rate = tf.Variable(tf.constant(0.1))
mu_noise = 0.
stddev_noise = 0.01

#add all your W variables here when you have more than one:
train_w_vars_list = [W]
grad = tf.gradients(cross_entropy, train_w_vars_list)

assign_list = []
for g, v in zip(grad, train_w_vars_list):
  eps = tf.random_normal(tf.shape(g), mean=mu_noise, stddev=stddev_noise)
  assign_list.append(v.assign(tf.mod(v - learning_rate*g + eps, 20)))

#also update the learning rate here if you want to:
assign_list.append(learning_rate.assign(learning_rate - 0.001))

train_step = tf.group(*assign_list)


sess = tf.InteractiveSession()
tf.global_variables_initializer().run()


# Train
for _ in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})


# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                    y_: mnist.test.labels}))

这篇关于只能使用TensorFlow中处理梯度的代码示例来实现像优化器这样的梯度下降吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆