pytorch - loss.backward() 和 optimizer.step() 之间的连接 [英] pytorch - connection between loss.backward() and optimizer.step()

查看:61
本文介绍了pytorch - loss.backward() 和 optimizer.step() 之间的连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

optimizerloss 之间的显式联系在哪里?

Where is an explicit connection between the optimizer and the loss?

优化器如何在没有调用喜欢这个optimizer.step(loss)的情况下知道从哪里获得损失的梯度?

How does the optimizer know where to get the gradients of the loss without a call liks this optimizer.step(loss)?

-更多上下文-

当我最小化损失时,我不必将梯度传递给优化器.

When I minimize the loss, I didn't have to pass the gradients to the optimizer.

loss.backward() # Back Propagation
optimizer.step() # Gardient Descent

推荐答案

无需深入研究 pytorch 的内部结构,我可以提供一个简单的答案:

Without delving too deep into the internals of pytorch, I can offer a simplistic answer:

回想一下,在初始化 optimizer 时,您明确告诉它应该更新模型的哪些参数(张量).梯度被存储"为存储".由张量本身(他们有一个 gradrequires_grad属性)一旦你在损失上调用 backward() .在计算模型中所有张量的梯度后,调用 optimizer.step() 使优化器迭代它应该更新的所有参数(张量)并使用它们内部存储的 grad代码>来更新它们的值.

Recall that when initializing optimizer you explicitly tell it what parameters (tensors) of the model it should be updating. The gradients are "stored" by the tensors themselves (they have a grad and a requires_grad attributes) once you call backward() on the loss. After computing the gradients for all tensors in the model, calling optimizer.step() makes the optimizer iterate over all parameters (tensors) it is supposed to update and use their internally stored grad to update their values.

有关计算图和附加grad"的更多信息存储在 pytorch 张量中的信息可以在 this answer 中找到.

More info on computational graphs and the additional "grad" information stored in pytorch tensors can be found in this answer.

优化器引用参数有时会导致问题,例如,当模型在初始化优化器后移动到 GPU 时.确保在构建优化器之前完成了模型的设置.有关详细信息,请参阅此答案.

Referencing the parameters by the optimizer can sometimes cause troubles, e.g., when the model is moved to GPU after initializing the optimizer. Make sure you are done setting up your model before constructing the optimizer. See this answer for more details.

这篇关于pytorch - loss.backward() 和 optimizer.step() 之间的连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆