如何在 Tensorflow 中使用 stop_gradient [英] How to use stop_gradient in Tensorflow

查看:39
本文介绍了如何在 Tensorflow 中使用 stop_gradient的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何在 tensorflow 中使用 stop_gradient,文档对我来说不清楚.

I'm wondering how to use stop_gradient in tensorflow, and the documentation is not clear to me.

我目前正在使用 stop_gradient 来生成损失函数 w.r.t. 的梯度.CBOW word2vec 模型中的词嵌入.我只想获得价值,而不是进行反向传播(因为我正在生成对抗性示例).

I'm currently using stop_gradient to produce the gradient of the loss function w.r.t. the word embeddings in a CBOW word2vec model. I want to just get the value, and not do backpropagation (as I'm generating adversarial examples).

目前,我正在使用代码:

Currently, I'm using the code:

lossGrad = gradients.gradients(loss, embed)[0]
real_grad = lossGrad.eval(feed_dict)

但是当我运行它时,它无论如何都会进行反向传播!我做错了什么,同样重要的是,我该如何解决这个问题?

But when I run this, it does the backpropogation anyway! What am I doing wrong, and just as importantly, how can I fix this?

澄清:通过反向传播"来澄清我的意思是计算值和更新模型参数".

CLARIFICATION: To clarify by "backpropagation" I mean "calculating values and updating model parameters".

如果我在第一个训练步骤后运行上面的两行,在 100 个训练步骤后我得到的损失与我不运行这两行时的损失不同.我可能从根本上误解了 Tensorflow.

If I run the two lines above after the first training step, the I get a different loss after 100 training steps than when I don't run those two lines. I might be fundamentally misunderstanding something about Tensorflow.

我已经尝试在图声明的开头和每个训练步骤之前使用 set_random_seed 进行设置.多次运行之间的总损失是一致的,但在包括/排除这两条线之间却不一致.因此,如果不是 RNG 导致了差异,也不是训练步骤之间模型参数的意外更新,您知道什么会导致这种行为吗?

I've tried setting using set_random_seed both in the beginning of the graph declaration and before each training step. The total loss is consistent between multiple runs, but not between including/excluding those two lines. So if it's not the RNG causing the disparity, and it's not unanticipated updating of the model parameters between training steps, do you have any idea what would cause this behavior?

好吧,有点晚了,但这是我解决的方法.我只想优化一些变量,但不是全部.我认为阻止优化某些变量的方法是使用 stop_grad - 但我从来没有找到一种方法来实现它.也许有一种方法,但对我有用的是调整我的 optimizer 以仅优化变量列表.所以,而不是:

Welp, it's a bit late but here's how I solved it. I only wanted to optimize over some, but not all, variables. I thought that the way to prevent optimizing some variables would be to use stop_grad - but I never found a way to make that work. Maybe there is a way, but what worked for me was to adjust my optimizer to only optimize over a list of variables. So instead of:

opt = tf.train.GradientDescentOptimizer(learning_rate=eta)
train_op = opt.minimize(loss)

我用过:

opt = tf.train.GradientDescentOptimizer(learning_rate=eta)
train_op = opt.minimize(loss, var_list=[variables to optimize over])

这阻止了 opt 更新不在 var_list 中的变量.希望它也适用于您!

This prevented opt from updating the variables not in var_list. Hopefully it works for you, too!

推荐答案

tf.stop_gradient 提供了一种在反向传播期间不计算某些变量梯度的方法.

tf.stop_gradient provides a way to not compute gradient with respect to some variables during back-propagation.

例如,在下面的代码中,我们有三个变量,w1、w2、w3 和输入 x.损失是 square((x1.dot(w1) - x.dot(w2 * w3))).我们希望将 wrt 的损失最小化到 w1,但希望保持 w2 和 w3 固定.为了实现这一点,我们可以只放置 tf.stop_gradient(tf.matmul(x, w2*w3)).

For example, in the code below, we have three variables, w1, w2, w3 and input x. The loss is square((x1.dot(w1) - x.dot(w2 * w3))). We want to minimize this loss wrt to w1 but want to keep w2 and w3 fixed. To achieve this we can just put tf.stop_gradient(tf.matmul(x, w2*w3)).

在下图中,我绘制了 w1、w2 和 w3 如何从它们的初始值作为训练迭代的函数.可以看出 w2 和 w3 保持不变,而 w1 变化直到等于 w2 * w3.

In the figure below, I plotted how w1, w2, and w3 from their initial values as the function of training iterations. It can be seen that w2 and w3 remains fixed while w1 changes until it becomes equal to w2 * w3.

显示 w1 只学习但不学习 w2 和 w3 的图像:

An image showing that w1 only learns but not w2 and w3:

import tensorflow as tf
import numpy as np

w1 = tf.get_variable("w1", shape=[5, 1], initializer=tf.truncated_normal_initializer())
w2 = tf.get_variable("w2", shape=[5, 1], initializer=tf.truncated_normal_initializer())
w3 = tf.get_variable("w3", shape=[5, 1], initializer=tf.truncated_normal_initializer())
x = tf.placeholder(tf.float32, shape=[None, 5], name="x")


a1 = tf.matmul(x, w1)
a2 = tf.matmul(x, w2*w3)
a2 = tf.stop_gradient(a2)
loss = tf.reduce_mean(tf.square(a1 - a2))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
gradients = optimizer.compute_gradients(loss)
train_op = optimizer.apply_gradients(gradients)

这篇关于如何在 Tensorflow 中使用 stop_gradient的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆