Pytorch 中缺乏 L1 正则化的稀疏解决方案 [英] Lack of Sparse Solution with L1 Regularization in Pytorch

查看:134
本文介绍了Pytorch 中缺乏 L1 正则化的稀疏解决方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在简单神经网络的第一层(1 个隐藏层)上实现 L1 正则化.我查看了 StackOverflow 上的其他一些帖子,这些帖子使用 Pytorch 应用 l1 正则化来找出应该如何完成(参考:在 PyTorch 中添加 L1/L2 正则化?在 Pytorch 中,如何将 L1 正则化器添加到激活中?).无论我将 lambda(l1 正则化强度参数)增加多高,我都不会在第一个权重矩阵中得到真正的零.为什么会这样?(代码如下)

I am trying to implement L1 regularization onto the first layer of a simple neural network (1 hidden layer). I looked into some other posts on StackOverflow that apply l1 regularization using Pytorch to figure out how it should be done (references: Adding L1/L2 regularization in PyTorch?, In Pytorch, how to add L1 regularizer to activations?). No matter how high I increase lambda (the l1 regularization strength parameter) I do not get true zeros in the first weight matrix. Why would this be? (Code is below)

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

class Network(nn.Module):
    def __init__(self,nf,nh,nc):
        super(Network,self).__init__()
        self.lin1=nn.Linear(nf,nh)
        self.lin2=nn.Linear(nh,nc)

    def forward(self,x):
        l1out=F.relu(self.lin1(x))
        out=F.softmax(self.lin2(l1out))
        return out, l1out

def l1loss(layer):
    return torch.norm(layer.weight.data, p=1)

nf=10
nc=2
nh=6
learningrate=0.02
lmbda=10.
batchsize=50

net=Network(nf,nh,nc)

crit=nn.MSELoss()
optimizer=torch.optim.Adagrad(net.parameters(),lr=learningrate)


xtr=torch.Tensor(xtr)
ytr=torch.Tensor(ytr)
#ytr=torch.LongTensor(ytr)
xte=torch.Tensor(xte)
yte=torch.LongTensor(yte)
#cyte=torch.Tensor(yte)

it=200
for epoch in range(it):
    per=torch.randperm(len(xtr))
    for i in range(0,len(xtr),batchsize):
        ind=per[i:i+batchsize]
        bx,by=xtr[ind],ytr[ind]            
        optimizer.zero_grad()
        output, l1out=net(bx)
#        l1reg=l1loss(net.lin1)    
        loss=crit(output,by)+lmbda*l1loss(net.lin1)
        loss.backward()
        optimizer.step()
    print('Epoch [%i/%i], Loss: %.4f' %(epoch+1,it, np.float32(loss.data.numpy())))

corr=0
tot=0
for x,y in list(zip(xte,yte)):
    output,_=net(x)
    _,pred=torch.max(output,-1)
    tot+=1 #y.size(0)
    corr+=(pred==y).sum()
print(corr)

注意:数据有 10 个特征(2 个类别和 800 个训练样本),只有前 2 个是相关的(根据设计),因此假设真零应该很容易学习.

Note: The data has 10 features (2 classes and 800 training samples) and only the first 2 are relevant (by design) so one would assume true zeros should be easy enough to learn.

推荐答案

您对 layer.weight.data 的使用从其自动微分上下文中删除了参数(它是 PyTorch 变量),使其成为当优化器采用梯度时的常数.这导致零梯度并且不计算 L1 损失.

Your usage of layer.weight.data removes the parameter (which is a PyTorch variable) from its automatic differentiation context, making it a constant when the optimiser takes the gradients. This results in zero gradients and that the L1 loss is not computed.

如果删除 .data,则将计算 PyTorch 变量的范数,并且梯度应该是正确的.

If you remove the .data, the norm is computed of the PyTorch variable and the gradients should be correct.

有关 PyTorch 自动微分机制的更多信息,请参阅此 docs 文章 或这个 教程.

For more information on PyTorch's automatic differentiation mechanics, see this docs article or this tutorial.

这篇关于Pytorch 中缺乏 L1 正则化的稀疏解决方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆