为什么张量流量中的tanh梯度为``grad = dy *(1-y * y)'' [英] Why gradient of tanh in tensorflow is `grad = dy * (1 - y*y)`

查看:88
本文介绍了为什么张量流量中的tanh梯度为``grad = dy *(1-y * y)''的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

tf.raw_ops.TanhGrad 说, grad= dy *(1-y * y),其中 y = tanh(x).

但是我认为,由于 dy/dx = 1-y * y ,其中 y = tanh(x),所以grad应该是 dy/(1-y * y).我在哪里错了?

But I think since dy / dx = 1 - y*y, where y = tanh(x), grad should be dy / (1 - y*y). Where am I wrong?

推荐答案

dy/dx 这样的表达式是

An expression like dy / dx is a mathematical notation for the derivative, it is not an actual fraction. It is meaningless to move dy or dx around individually as you would with a numerator and denominator.

从数学上讲,已知 d(tanh(x))/dx = 1-(tanh(x))^ 2 .TensorFlow计算梯度向后".(称为反向传播,或更普遍地称为 tanh(x) 的梯度的计算.函数 g(tanh(x)). g 表示应用于 tanh 的输出以达到针对其计算梯度的值的所有操作.根据链规则,此函数 g 的派生,是 d(g(tanh(x)))/dx = d(g(tanh(x))/d(tanh(x))* d(tanh(x))/dx .第一个因子 d(g(tanh(x))/d(tanh(x))是直到 tanh 的反向累积梯度,即所有以后的操作,并且是函数文档中 dy 的值.因此,您只需要计算 d(tanh(x))/dx (是(1-y * y),因为 y = tanh(x)),然后将其乘以给定的 dy .结果将是然后进一步传播回该操作,该操作首先将输入 x 生成为 tanh ,它将在计算该梯度,依此类推,直到达到梯度源为止.

Mathematically, it is known that d(tanh(x))/dx = 1 - (tanh(x))^2. TensorFlow computes gradients "backwards" (what is called backpropagation, or more generally reverse automatic differentiation). That means that, in general, we will reach the computation of the gradient of tanh(x) after reaching the step where we compute the gradient of an "outer" function g(tanh(x)). g represents all the operations that are applied to the output of tanh to reach the value for which the gradient is computed. The derivative of this function g, according to the chain rule, is d(g(tanh(x)))/dx = d(g(tanh(x))/d(tanh(x)) * d(tanh(x))/dx. The first factor, d(g(tanh(x))/d(tanh(x)), is the reverse accumulated gradient up until tanh, that is, the derivate of all those later operations, and is the value of dy in the documentation of the function. Therefore, you only need to compute d(tanh(x))/dx (which is (1 - y * y), because y = tanh(x)) and multiply it by the given dy. The resulting value will then be propagated further back to the operation that produced the input x to tanh in the first place, and it will become the dy value in the computation of that gradient, and so on until the gradient sources are reached.

这篇关于为什么张量流量中的tanh梯度为``grad = dy *(1-y * y)''的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆