非确定性梯度计算 [英] Non-deterministic Gradient Computation

查看:97
本文介绍了非确定性梯度计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我意识到,即使我将TensorFlow随机种子保持不变,每次训练它们的模型最终都会有所不同.

I realized that my models end up being different every time I train them, even though I keep the TensorFlow random seed the same.

我确认:

  • 初始化是确定性的;在第一次更新之前,权重是相同的.
  • 输入是确定性的.实际上,对于第一批产品,包括损失在内的各种正向计算都是相同的.
  • 第一批的梯度不同.具体来说,我正在比较tf.gradients(loss, train_variables)的输出.虽然losstrain_variables具有相同的值,但某些变量的梯度有时有时不同.差异非常大(有时单个变量的梯度的绝对差异之和大于1).
  • Initialization is deterministic; the weights are identical before the first update.
  • Inputs are deterministic. In fact, various forward computations, including the loss, are identical for the very first batch.
  • The gradients for the first batch are different. Concretely, I'm comparing the outputs of tf.gradients(loss, train_variables). While loss and train_variables have identical values, the gradients are sometimes different for some of the Variables. The differences are quite significant (sometimes the sum-of-absolute-differences for a single variable's gradient is greater than 1).

我得出结论,这是导致不确定性的梯度计算. 我查看了此问题,当在具有intra_op_parallelism_thread=1inter_op_parallelism_thread=1.

I conclude that it's the gradient computation that causes the non-determinism. I had a look at this question and the problem persists when running on a CPU with intra_op_parallelism_thread=1 and inter_op_parallelism_thread=1.

如果没有正向通行,反向通行如何不确定?我该如何进一步调试呢?

How can the backward pass be non-deterministic when the forward pass isn't? How could I debug this further?

推荐答案

这个答案似乎有些明显,但是您是否使用某种非确定性的正则化方法,例如辍学?鉴于辍学在训练时会随机掉落"某些连接,这可能是造成梯度差异的原因.

This answer might seem a little obvious, but do you use some kind of non-deterministic regularization such as dropout? Given that dropout "drops" some connections randomly when training, it may be causing that difference on the gradients.

类似问题:

  • How to get stable results with TensorFlow, setting random seed
  • Tensorflow not being deterministic, where it should

这似乎与TensorFlow的实现有关.请参阅GitHub中的以下未解决问题:

Edit 2: This seems to be an issue with TensorFlow's implementation. See the following open issues in GitHub:

  • Problems Getting TensorFlow to behave Deterministically
  • Non-deterministic behaviour when ran on GPU

这篇关于非确定性梯度计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆