如何在 pytorch 中处理多重损失? [英] How can i process multi loss in pytorch?

查看:70
本文介绍了如何在 pytorch 中处理多重损失?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

像这样,我想使用一些辅助损失来提升我的模型性能.
pytorch中哪些类型的代码可以实现?

Such as this, I want to using some auxiliary loss to promoting my model performance.
Which type code can implement it in pytorch?

#one
loss1.backward()
loss2.backward()
loss3.backward()
optimizer.step()
#two
loss1.backward()
optimizer.step() 
loss2.backward()
optimizer.step() 
loss3.backward()
optimizer.step()   
#three
loss = loss1+loss2+loss3
loss.backward()
optimizer.step()

感谢您的回答!

推荐答案

第一次和第三次尝试完全相同且正确,而第二次尝试完全错误.

First and 3rd attempt are exactly the same and correct, while 2nd approach is completely wrong.

原因是,在 Pytorch 中,低层梯度不会被后续的 backward() 调用覆盖",而是将它们累积或求和.这使得第一种和第三种方法相同,但如果您的 GPU/RAM 内存不足,则第一种方法可能更可取,因为 1024 的批大小与立即 backward() + step() 调用与有 8 个大小为 128 的批次和 8 个 backward() 调用,最后有一个 step() 调用.

Reason is, in Pytorch, low layer gradients are Not "overwritten" by subsequent backward() calls, rather they are accumulated, or summed. This makes first and 3rd approach identical, though 1st approach might be preferable if you have low-memory GPU/RAM, since a batch size of 1024 with immediate backward() + step() call is same as having 8 batches of size 128 and 8 backward() calls, with one step() call in the end.

为了说明这个想法,这里有一个简单的例子.我们想让我们的张量 x 同时最接近 [40,50,60]:

To illustrate the idea, here is a simple example. We want to get our tensor x closest to [40,50,60] simultaneously:

x = torch.tensor([1.0],requires_grad=True)
loss1 = criterion(40,x)
loss2 = criterion(50,x)
loss3 = criterion(60,x)

现在是第一种方法:(我们使用 tensor.grad 来获取张量 x 的当前梯度)

Now the first approach: (we use tensor.grad to get current gradient for our tensor x)

loss1.backward()
loss2.backward()
loss3.backward()

print(x.grad)

输出:tensor([-294.])(将 retain_graph=True 放在前两个 backward 调用中以获得更复杂的计算图)

This outputs : tensor([-294.]) ( put retain_graph=True in first two backward calls for more complicated computational graphs)

第三种方法:

loss = loss1+loss2+loss3
loss.backward()
print(x.grad)

同样的输出是:tensor([-294.])

第二种方法不同,因为我们不会在调用 step() 方法后调用 opt.zero_grad.这意味着在所有 3 个 step 调用中使用了第一个 backward 调用的梯度.例如,如果 3 个损失为相同的权重提供梯度 5,1,4,而不是 10 (=5+1+4),现在您的权重将为 5*3+1*2+4*1=21 作为渐变.

2nd approach is different because we don't call opt.zero_grad after calling step() method. This means in all 3 step calls gradients of first backward call is used. For example, if 3 losses provide gradients 5,1,4 for same weight, instead of having 10 (=5+1+4), now your weight will have 5*3+1*2+4*1=21 as gradient.

进一步阅读:链接 1,链接 2

这篇关于如何在 pytorch 中处理多重损失?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆