TensorFlow中的梯度概念如何与梯度的数学定义相关? [英] How it the concept of gradient in TensorFlow related to the mathematical definition of gradient?

查看:132
本文介绍了TensorFlow中的梯度概念如何与梯度的数学定义相关?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TensorFlow文档解释了 功能

tf.gradients(
    ys,
    xs,
    grad_ys=None,
    name='gradients',
    colocate_gradients_with_ops=False,
    gate_gradients=False,
    aggregation_method=None,
    stop_gradients=None
)

说:

  • [it]构造ys w.r.t之和的符号导数. x代表xs.
  • ys和xs均为张量或张量列表.
  • gradients()将ops添加到图形中,以输出ys相对于xs的导数.
  • ys:要区分的张量或张量列表

我发现很难将其与梯度的数学定义联系起来.例如,根据维基百科,标量函数的梯度 f(x1,x2,x3,...,xn)是一个矢量场(即函数grad f:Rn-> Rn),具有涉及矢量点积的某些属性.您还可以在某个点上谈论f的梯度:(grad f)(x1,x2,x3,...,xn).

I find it difficult to relate this with the mathematical definition of gradient. For example, according to wikipedia, the gradient of a scalar function f(x1, x2, x3, ..., xn) is a vector field (i.e. a function grad f : Rn -> Rn) with certain properties involving the dot product of vectors. You can also speak about the gradient of f at a certain point: (grad f)(x1, x2, x3, ..., xn).

TensorFlow文档讲的是张量而不是矢量:梯度的定义是否可以从将矢量映射到标量的函数到将张量映射到标量的函数进行概括?张量之间是否存在点积?

The TensorFlow documentation speaks about tensors instead of vectors: can the definition of gradient be generalized from functions that map vectors to scalars to functions that map tensors to scalars? Is there a dot product between tensors?

即使可以将梯度的定义应用于将张量映射到标量的函数f中(定义中的点积在张量上起作用),该文档也谈到了区分张量本身:参数ys是张量或要区分的张量列表".根据文档张量是用于计算的多维数组",所以张量不是函数,如何区分它?

Even if the definition of gradient can be applied to functions f that map tensors to scalars (with the dot product in the definition working on tensors), the documentation speaks about differentiating tensors themselves: the parameter ys is a "Tensor or list of tensors to be differentiated". According to the documentation "Tensor is a multi-dimensional array used for computation", so a tensor is not a function, how can it be differentiated?

那么,TensorFlow中的渐变概念与维基百科的定义到底有什么关系?

So, how is this concept of gradient in TensorFlow exactly related to the definition from wikipedia?

推荐答案

人们会期望 Tensorflow梯度就是

One would expect that the Tensorflow Gradient is simply the Jacobian, i.e. the derivative of a rank (m) tensor Y against a rank (n) tensor X is the rank (m + n) tensor comprised of each individual derivative ∂Yj1...jm/∂Xi1...in.

但是,您可能会注意到梯度实际上不是等级( m + n )张量,而是始终采用张量的等级 n > X -实际上,Tensorflow似乎为您提供了标量总和( Y )对 X 的梯度.

However, you may notice that the gradient isn't actually a rank (m + n) tensor, but rather always takes the rank n of the tensor X -- indeed, it appears that Tensorflow gives you the gradient of the scalar sum(Y) against X.

当然,真正的雅可比行列式存储在内部,以便在应用Chain规则时进行计算.

Of course, the real Jacobians are stored internally for calculation in applying the Chain rule.

这篇关于TensorFlow中的梯度概念如何与梯度的数学定义相关?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆