tensorflow 使用自动梯度还是符号梯度? [英] Does tensorflow use automatic or symbolic gradients?

查看:35
本文介绍了tensorflow 使用自动梯度还是符号梯度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直没有找到关于 tensorflow 是使用自动微分还是符号微分的明确说明.

我浏览了 tensorflow ,在其中放置一个非常小的 h 并在两个位置评估函数.这是最基本的公式,在实践中人们使用其他公式来给出更小的估计误差.如果您不知道您的函数并且只能对其进行采样,则这种计算导数的方法最适合.对于高亮度函数,它也需要大量计算.

这里对于每个数学表达式,他们都知道导数,并使用各种规则(乘积规则、链式规则)来计算结果导数.然后他们简化结束表达式以获得结果表达式.

自动微分 操纵计算机程序块.微分器具有获取程序每个元素的导数的规则(当您在核心 TF 中定义任何操作时,您需要 为此操作注册渐变).它还使用链式规则将复杂的表达式分解为更简单的表达式.这是一个很好的例子,它是如何在真实的 TF 程序中工作的,并附有一些解释.

<小时>

您可能认为自动微分与符号微分相同(它们在一个地方对数学表达式进行运算,在另一个地方对计算机程序进行运算).是的,它们有时非常相似.但是对于控制流语句(`if、while、循环),结果可能非常不同:

<块引用>

符号微分导致代码效率低下(除非仔细完成)并面临将计算机程序转换为单一表达

I haven't been able to find a clear statement of whether tensorflow uses automatic or symbolic differentiation.

I skimmed the tensorflow paper and they mention automatic gradients, but it is unclear if they just mean symbolic gradients, as they also mention that it has that capability.

解决方案

TF uses automatic differentiation and more specifically reverse-mode auto differentiation.


There are 3 popular methods to calculate the derivative:

  1. Numerical differentiation
  2. Symbolic differentiation
  3. Automatic differentiation

, where you put a very small h and evaluate function in two places. This is the most basic formula and on practice people use other formulas which give smaller estimation error. This way of calculating a derivative is suitable mostly if you do not know your function and can only sample it. Also it requires a lot of computation for a high-dim function.

Symbolic differentiation manipulates mathematical expressions. If you ever used matlab or mathematica, then you saw something like this

Here for every math expression they know the derivative and use various rules (product rule, chain rule) to calculate the resulting derivative. Then they simplify the end expression to obtain the resulting expression.

Automatic differentiation manipulates blocks of computer programs. A differentiator has the rules for taking the derivative of each element of a program (when you define any op in core TF, you need to register a gradient for this op). It also uses chain rule to break complex expressions into simpler ones. Here is a good example how it works in real TF programs with some explanation.


You might think that Automatic differentiation is the same as Symbolic differentiation (in one place they operate on math expression, in another on computer programs). And yes, they are sometimes very similar. But for control flow statements (`if, while, loops) the results can be very different:

symbolic differentiation leads to inefficient code (unless carefully done) and faces the difficulty of converting a computer program into a single expression

这篇关于tensorflow 使用自动梯度还是符号梯度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆