Tensorflow 梯度胶带的用途是什么? [英] What is the purpose of the Tensorflow Gradient Tape?

查看:37
本文介绍了Tensorflow 梯度胶带的用途是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Tensorflow 中观看了关于 Eager Execution 的 Tensorflow 开发者峰会视频,主持人介绍了Gradient Tape".现在我明白 Gradient Tape 跟踪 TF 模型中发生的自动微分.

I watched the Tensorflow Developer's summit video on Eager Execution in Tensorflow, and the presenter gave an introduction to "Gradient Tape." Now I understand that Gradient Tape tracks the automatic differentiation that occurs in a TF model.

我试图理解为什么我会使用渐变胶带?谁能解释一下 Gradient Tape 如何用作诊断工具?为什么有人会使用 Gradient Tape 而不是 Tensorboard 权重可视化.

I was trying to understand why I would use Gradient Tape? Can anyone explain how Gradient Tape is used as a diagnostic tool? Why would someone use Gradient Tape versus just Tensorboard visualization of weights.

所以我知道模型发生的自动微分是计算每个节点的梯度——这意味着在给定一批数据的情况下调整每个节点的权重和偏差.这就是学习的过程.但我的印象是我实际上可以使用 tf.keras.callback.TensorBoard() 调用来查看训练的张量板可视化——这样我就可以观察每个节点上的权重并确定是否有任何死节点或过饱和节点.

So I get that the automatic differentiation that occurs with a model is to compute the gradients of each node--meaning the adjustment of the weights and biases at each node, given some batch of data. So that is the learning process. But I was under the impression that I can actually use a tf.keras.callback.TensorBoard() call to see the tensorboard visualization of training--so I can watch the weights on each node and determine if there are any dead or oversaturated nodes.

渐变胶带的使用是否只是为了查看某些渐变是否变为零或变得非常大等?或者渐变胶带还有其他用途吗?

Is the use of Gradient Tape only to see if some gradients go to zero or get really big, etc? Or is there some other use of the Gradient Tape?

推荐答案

启用急切执行后,Tensorflow 将计算代码中出现的张量值.这意味着它不会预先计算通过占位符输入的静态图.这意味着要反向传播错误,您必须跟踪计算的梯度,然后将这些梯度应用于优化器.

With eager execution enabled, Tensorflow will calculate the values of tensors as they occur in your code. This means that it won't precompute a static graph for which inputs are fed in through placeholders. This means to back propagate errors, you have to keep track of the gradients of your computation and then apply these gradients to an optimiser.

这与没有急切执行的运行非常不同,在那里您可以构建一个图,然后简单地使用 sess.run 来评估您的损失,然后将其直接传递给优化器.

This is very different from running without eager execution, where you would build a graph and then simply use sess.run to evaluate your loss and then pass this into an optimiser directly.

从根本上说,因为张量是立即计算的,所以您没有计算梯度的图形,因此您需要一个梯度磁带.与其说它只是用于可视化,还不如说没有它就无法在 Eager 模式下实现梯度下降.

Fundamentally, because tensors are evaluated immediately, you don't have a graph to calculate gradients and so you need a gradient tape. It is not so much that it is just used for visualisation, but more that you cannot implement a gradient descent in eager mode without it.

显然,Tensorflow 可以跟踪每个 tf.Variable 上每次计算的每个梯度.然而,这可能是一个巨大的性能瓶颈.它们公开了一个渐变带,以便您可以控制代码的哪些区域需要渐变信息.请注意,在非急切模式下,这将根据损失的后代计算分支静态确定,但在急切模式下没有静态图,因此无法知道.

Obviously, Tensorflow could just keep track of every gradient for every computation on every tf.Variable. However, that could be a huge performance bottleneck. They expose a gradient tape so that you can control what areas of your code need the gradient information. Note that in non-eager mode, this will be statically determined based on the computational branches that are descendants of your loss but in eager mode there is no static graph and so no way of knowing.

这篇关于Tensorflow 梯度胶带的用途是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆