为什么在Pytorch中对网络的权重进行复制时,它将在反向传播后自动更新? [英] Why is it in Pytorch when I make a COPY of a network's weight it would be automatically updated after back-propagation?

查看:176
本文介绍了为什么在Pytorch中对网络的权重进行复制时,它将在反向传播后自动更新?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了以下代码作为测试,因为在我的原始网络中,我使用ModuleDict,并且取决于我提供的索引将仅对该网络的一部分进行切片和训练.

我想确保只有切成薄片的层会更新它们的权重,所以我编写了一些测试代码来进行仔细检查.好吧,我得到一些奇怪的结果.假设我的模型有2个层,第1层是FC,第2层是Conv2d,如果我对网络进行切片并仅使用第2层,则我希望第1层的权重保持不变,因为它们未被使用,并且第1层后将更新第2层的权重. /p>

所以我的计划是使用for循环从网络中获取所有权重.在训练之前,我将在1 optimizer.step()之后进行.两次,我都将这些权重完全存储在2个Python列表中,以便以后可以比较它们的结果.好吧,出于某种原因,如果将两个列表与torch.equal()进行比较,我会认为它们完全相同,因为它可能是因为内存中仍然存在某种隐藏链接?因此,当我从循环中获取权重时,我尝试在权重上使用.detach(),结果仍然相同.在这种情况下,Layer2的权重应该有所不同,因为它应该在训练之前包含来自网络的权重.

在下面的代码中指出,我实际上是在使用layer1并忽略layer2.

完整代码:

class mymodel(nn.Module):
    def __init__(self):
        super().__init__() 
        self.layer1 = nn.Linear(10, 5)
        self.layer2 = nn.Conv2d(1, 5, 4, 2, 1)
        self.act = nn.Sigmoid()
    def forward(self, x):
        x = self.layer1(x) #only layer1 and act are used layer 2 is ignored so only layer1 and act's weight should be updated
        x = self.act(x)
        return x
model = mymodel()

weights = []

for param in model.parameters(): # loop the weights in the model before updating and store them
    print(param.size())
    weights.append(param)

critertion = nn.BCELoss() #criterion and optimizer setup
optimizer = optim.Adam(model.parameters(), lr = 0.001)

foo = torch.randn(3, 10) #fake input
target = torch.randn(3, 5) #fake target

result = model(foo) #predictions and comparison and backprop
loss = criterion(result, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()


weights_after_backprop = [] # weights after backprop
for param in model.parameters():
    weights_after_backprop.append(param) # only layer1's weight should update, layer2 is not used

for i in zip(weights, weights_after_backprop):
    print(torch.equal(i[0], i[1]))

# **prints all Trues when "layer1" and "act" should be different, I have also tried to call param.detach in the loop but I got the same result.

解决方案

您必须clone参数,否则只需复制引用即可.

weights = []

for param in model.parameters():
    weights.append(param.clone())

criterion = nn.BCELoss() # criterion and optimizer setup
optimizer = optim.Adam(model.parameters(), lr=0.001)

foo = torch.randn(3, 10) # fake input
target = torch.randn(3, 5) # fake target

result = model(foo) # predictions and comparison and backprop
loss = criterion(result, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()


weights_after_backprop = [] # weights after backprop
for param in model.parameters():
    weights_after_backprop.append(param.clone()) # only layer1's weight should update, layer2 is not used

for i in zip(weights, weights_after_backprop):
    print(torch.equal(i[0], i[1]))

给出

False
False
True
True

I wrote the following code as a test because in my original network I use a ModuleDict and depends on what index I feed it would slice and train only parts of that network.

I wanted to make sure that only the sliced layers would update their weight so I wrote some test code to double check. Well I am getting some weird results. Say if my model has 2 layers, layer1 is an FC and layer 2 is Conv2d, if I slice the network and ONLY use layer2 I would expect layer1's weight to be unchanged because they are unused and layer2's weight will get updated after 1 epoch.

So my plan was to used a for loop to grab all the weights from the network Before training then I would do it AFTER 1 optimizer.step(). Both times I would store those weights completely separate in 2 Python lists so I can compare their results later. Well for some reason the two lists are completely the same if I compare them with torch.equal() I thought its because maybe there is still some sort of hidden link in memory? So I tried to use .detach() on the weights when I grab them from the loop and the result is still the same. Layer2's weight should be different in this case because it should contain weights from the network before training.

Noted in the code below I am actually using layer1 and ignoring layer2.

Full code:

class mymodel(nn.Module):
    def __init__(self):
        super().__init__() 
        self.layer1 = nn.Linear(10, 5)
        self.layer2 = nn.Conv2d(1, 5, 4, 2, 1)
        self.act = nn.Sigmoid()
    def forward(self, x):
        x = self.layer1(x) #only layer1 and act are used layer 2 is ignored so only layer1 and act's weight should be updated
        x = self.act(x)
        return x
model = mymodel()

weights = []

for param in model.parameters(): # loop the weights in the model before updating and store them
    print(param.size())
    weights.append(param)

critertion = nn.BCELoss() #criterion and optimizer setup
optimizer = optim.Adam(model.parameters(), lr = 0.001)

foo = torch.randn(3, 10) #fake input
target = torch.randn(3, 5) #fake target

result = model(foo) #predictions and comparison and backprop
loss = criterion(result, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()


weights_after_backprop = [] # weights after backprop
for param in model.parameters():
    weights_after_backprop.append(param) # only layer1's weight should update, layer2 is not used

for i in zip(weights, weights_after_backprop):
    print(torch.equal(i[0], i[1]))

# **prints all Trues when "layer1" and "act" should be different, I have also tried to call param.detach in the loop but I got the same result.

解决方案

You have to clone the parameters, otherwise you just copy the reference.

weights = []

for param in model.parameters():
    weights.append(param.clone())

criterion = nn.BCELoss() # criterion and optimizer setup
optimizer = optim.Adam(model.parameters(), lr=0.001)

foo = torch.randn(3, 10) # fake input
target = torch.randn(3, 5) # fake target

result = model(foo) # predictions and comparison and backprop
loss = criterion(result, target)
optimizer.zero_grad()
loss.backward()
optimizer.step()


weights_after_backprop = [] # weights after backprop
for param in model.parameters():
    weights_after_backprop.append(param.clone()) # only layer1's weight should update, layer2 is not used

for i in zip(weights, weights_after_backprop):
    print(torch.equal(i[0], i[1]))

which gives

False
False
True
True

这篇关于为什么在Pytorch中对网络的权重进行复制时,它将在反向传播后自动更新?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆