非确定性梯度计算 [英] Non-deterministic Gradient Computation
问题描述
我意识到,即使我将TensorFlow随机种子保持不变,每次训练它们的模型最终都会有所不同.
I realized that my models end up being different every time I train them, even though I keep the TensorFlow random seed the same.
我确认:
- 初始化是确定性的;在第一次更新之前,权重是相同的.
- 输入是确定性的.实际上,对于第一批产品,包括损失在内的各种正向计算都是相同的.
- 第一批的梯度不同.具体来说,我正在比较
tf.gradients(loss, train_variables)
的输出.虽然loss
和train_variables
具有相同的值,但某些变量的梯度有时有时不同.差异非常大(有时单个变量的梯度的绝对差异之和大于1).
- Initialization is deterministic; the weights are identical before the first update.
- Inputs are deterministic. In fact, various forward computations, including the loss, are identical for the very first batch.
- The gradients for the first batch are different. Concretely, I'm comparing the outputs of
tf.gradients(loss, train_variables)
. Whileloss
andtrain_variables
have identical values, the gradients are sometimes different for some of the Variables. The differences are quite significant (sometimes the sum-of-absolute-differences for a single variable's gradient is greater than 1).
我得出结论,这是导致不确定性的梯度计算.
我查看了此问题,当在具有intra_op_parallelism_thread=1
和inter_op_parallelism_thread=1
.
I conclude that it's the gradient computation that causes the non-determinism.
I had a look at this question and the problem persists when running on a CPU with intra_op_parallelism_thread=1
and inter_op_parallelism_thread=1
.
如果没有正向通行,反向通行如何不确定?我该如何进一步调试呢?
How can the backward pass be non-deterministic when the forward pass isn't? How could I debug this further?
推荐答案
这个答案似乎有些明显,但是您是否使用某种非确定性的正则化方法,例如辍学?鉴于辍学在训练时会随机掉落"某些连接,这可能是造成梯度差异的原因.
This answer might seem a little obvious, but do you use some kind of non-deterministic regularization such as dropout? Given that dropout "drops" some connections randomly when training, it may be causing that difference on the gradients.
类似问题:
- How to get stable results with TensorFlow, setting random seed
- Tensorflow not being deterministic, where it should
这似乎与TensorFlow的实现有关.请参阅GitHub中的以下未解决问题:
Edit 2: This seems to be an issue with TensorFlow's implementation. See the following open issues in GitHub:
- Problems Getting TensorFlow to behave Deterministically
- Non-deterministic behaviour when ran on GPU
这篇关于非确定性梯度计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!