为什么在 Pytorch 中,当我复制网络权重时,它会在反向传播后自动更新? [英] Why is it in Pytorch when I make a COPY of a network's weight it would be automatically updated after back-propagation?

查看:23
本文介绍了为什么在 Pytorch 中,当我复制网络权重时,它会在反向传播后自动更新?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了以下代码作为测试,因为在我的原始网络中,我使用 ModuleDict 并取决于我提供的索引将仅切片和训练该网络的一部分.

I wrote the following code as a test because in my original network I use a ModuleDict and depends on what index I feed it would slice and train only parts of that network.

我想确保只有切片层会更新它们的权重,所以我写了一些测试代码来仔细检查.好吧,我得到了一些奇怪的结果.假设我的模型有 2 层,layer1 是 FC,而 layer 2 是 Conv2d,如果我对网络进行切片并且只使用 layer2,我希望 layer1 的权重保持不变,因为它们未使用,并且 layer2 的权重将在 1 个时期后更新.

I wanted to make sure that only the sliced layers would update their weight so I wrote some test code to double check. Well I am getting some weird results. Say if my model has 2 layers, layer1 is an FC and layer 2 is Conv2d, if I slice the network and ONLY use layer2 I would expect layer1's weight to be unchanged because they are unused and layer2's weight will get updated after 1 epoch.

所以我的计划是使用 for 循环从网络中获取所有权重在训练之前然后我会在 1 optimizer.step() 之后做.两次我都将这些权重完全分开存储在 2 个 Python 列表中,以便稍后比较它们的结果.好吧,由于某种原因,如果我将它们与 torch.equal() 进行比较,这两个列表是完全相同的,我认为这是因为内存中可能仍然存在某种隐藏链接?因此,当我从循环中抓取它们时,我尝试在权重上使用 .detach() 并且结果仍然相同.在这种情况下,Layer2 的权重应该不同,因为它应该包含训练前来自网络的权重.

So my plan was to used a for loop to grab all the weights from the network Before training then I would do it AFTER 1 optimizer.step(). Both times I would store those weights completely separate in 2 Python lists so I can compare their results later. Well for some reason the two lists are completely the same if I compare them with torch.equal() I thought its because maybe there is still some sort of hidden link in memory? So I tried to use .detach() on the weights when I grab them from the loop and the result is still the same. Layer2's weight should be different in this case because it should contain weights from the network before training.

在下面的代码中注意到我实际上使用的是 layer1 而忽略了 layer2.

Noted in the code below I am actually using layer1 and ignoring layer2.

完整代码:

class mymodel(nn.Module):
    def __init__(self):
        super().__init__() 
        self.layer1 = nn.Linear(10, 5)
        self.layer2 = nn.Conv2d(1, 5, 4, 2, 1)
        self.act = nn.Sigmoid()
    def forward(self, x):
        x = self.layer1(x) #only layer1 and act are used layer 2 is ignored so only layer1 and act's weight should be updated
        x = self.act(x)
        return x
model = mymodel()

weights = []

for param in model.parameters(): # loop the weights in the model before updating and store them
    print(param.size())
    weights.append(param)

critertion = nn.BCELoss() #criterion and optimizer setup
optimizer = optim.Adam(model.parameters(), lr = 0.001)

foo = torch.randn(3, 10) #fake input
target = torch.randn(3, 5) #fake target

result = model(foo) #predictions and comparison and backprop
loss = criterion(result, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()


weights_after_backprop = [] # weights after backprop
for param in model.parameters():
    weights_after_backprop.append(param) # only layer1's weight should update, layer2 is not used

for i in zip(weights, weights_after_backprop):
    print(torch.equal(i[0], i[1]))

# **prints all Trues when "layer1" and "act" should be different, I have also tried to call param.detach in the loop but I got the same result.

推荐答案

你必须clone参数,否则你只是复制引用.

You have to clone the parameters, otherwise you just copy the reference.

weights = []

for param in model.parameters():
    weights.append(param.clone())

criterion = nn.BCELoss() # criterion and optimizer setup
optimizer = optim.Adam(model.parameters(), lr=0.001)

foo = torch.randn(3, 10) # fake input
target = torch.randn(3, 5) # fake target

result = model(foo) # predictions and comparison and backprop
loss = criterion(result, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()


weights_after_backprop = [] # weights after backprop
for param in model.parameters():
    weights_after_backprop.append(param.clone()) # only layer1's weight should update, layer2 is not used

for i in zip(weights, weights_after_backprop):
    print(torch.equal(i[0], i[1]))

给出

False
False
True
True

这篇关于为什么在 Pytorch 中,当我复制网络权重时,它会在反向传播后自动更新?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆