为什么pytorch中的正则化和scratch code不匹配,pytorch中正则化使用的公式是什么? [英] Why does regularization in pytorch and scratch code does not match and what is the formula used for regularization in pytorch?

查看:20
本文介绍了为什么pytorch中的正则化和scratch code不匹配,pytorch中正则化使用的公式是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试对 PyTorch 中的二进制分类模型进行 L2 正则化,但是当我匹配 PyTorch 的结果和临时代码时,它不匹配,pytorch 代码:

class LogisticRegression(nn.Module):def __init__(self,n_input_features):super(LogisticRegression,self).__init__()self.linear=nn.Linear(4,1)self.linear.weight.data.fill_(0.0)self.linear.bias.data.fill_(0.0)def forward(self,x):y_predicted=torch.sigmoid(self.linear(x))返回 y_predicted模型=LogisticRegression(4)标准=nn.BCELoss()优化器=torch.optim.SGD(model.parameters(),lr=0.05,weight_decay=0.1)数据集=数据()train_data=DataLoader(dataset=dataset,batch_size=1096,shuffle=False)num_epochs=1000对于范围内的纪元(num_epochs):对于 train_data 中的 x,y:y_pred=模型(x)损失=标准(y_pred,y)损失.向后()优化器.step()optimizer.zero_grad()

暂存代码:

def sigmoid(z):s = 1/(1+ np.exp(-z))返回def yinfer(X, beta):返回 sigmoid(beta[0] + np.dot(X,beta[1:]))def 成本(X, Y, beta, lam):总和 = 0总和 1 = 0n = len(测试版)m = len(Y)对于范围内的 i (m):sum = sum + Y[i]*(np.log(yinfer(X[i],beta)))+(1 -Y[i])*np.log(1-yinfer(X[i],beta))对于范围内的 i (0, n):sum1 = sum1 + beta[i]**2返回 (-sum + (lam/2) * sum1)/(1.0*m)def pred(X,beta):如果(yinfer(X,beta)> 0.5):ypred = 1别的 :ypred = 0返回 ypred

beta = np.zeros(5)迭代 = 1000arr_cost = np.zeros((iterations,4))打印(测试版)n = len(Y_train)对于 i 在范围内(迭代):Y_prediction_train=np.zeros(len(Y_train))Y_prediction_test=np.zeros(len(Y_test))对于范围内的 l(len(Y_train)):Y_prediction_train[l]=pred(X[l,:],beta)对于范围内的 l(len(Y_test)):Y_prediction_test[l]=pred(X_test[l,:],beta)train_acc = 格式(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)test_acc = 100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100arr_cost[i,:] = [i,cost(X,Y_train,beta,lam),train_acc,test_acc]temp_beta = np.zeros(len(beta))'''下面的主要代码'''对于范围内的 j(n):temp_beta[0] = temp_beta[0] + yinfer(X[j,:], beta) - Y_train[j]temp_beta[1:] = temp_beta[1:] + (yinfer(X[j,:], beta) - Y_train[j])*X[j,:]对于范围内的 k(0, len(beta)):temp_beta[k] = temp_beta[k] + lam * beta[k] #这里的正则化temp_beta= temp_beta/(1.0*n)beta = beta - alpha*temp_beta

PyTorch 使用第一类(其中正则化因子不除以批量大小).

这是一个示例代码,它演示了这一点:

导入火炬将 torch.nn 导入为 nn导入 torch.nn.functional 作为 F将 numpy 导入为 np导入 torch.optim 作为 optim类模型(nn.Module):def __init__(self):super().__init__()self.linear = nn.Linear(1, 1)self.linear.weight.data.fill_(1.0)self.linear.bias.data.fill_(1.0)def forward(self, x):返回 self.linear(x)模型 = 模型()优化器 = optim.SGD(model.parameters(), lr=0.1, weight_decay=1.0)input = torch.tensor([[2], [4]], dtype=torch.float32)target = torch.tensor([[7], [11]], dtype=torch.float32)optimizer.zero_grad()预测 = 模型(输入)损失 = F.mse_loss(pred, target)打印(f'输入:{输入[0].数据,输入[1].数据}')打印(f'预测:{pred[0].data,pred[1].data}')打印(f'目标:{目标[0].数据,目标[1].数据}')打印(f'
MSEloss: {loss.item()}
')损失.向后()print('更新前:')打印(' -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - ----------------------------')打印(f'权重[数据,梯度]:{model.linear.weight.data,model.linear.weight.grad}')打印(f'bias [数据,梯度]:{model.linear.bias.data,model.linear.bias.grad}')打印(' -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - ----------------------------')优化器.step()print('更新后:')打印(' -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - ----------------------------')打印(f'权重[数据]:{model.linear.weight.data}')打印(f'bias [数据]:{model.linear.bias.data}')打印(' -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  - ----------------------------')

哪些输出:

input: (tensor([2.]), tensor([4.]))预测:(张量([3.]),张量([5.]))目标:(张量([7.]),张量([11.]))MSEloss:26.0更新前:--------------------------------------------------------------------------权重[数据,梯度]:(张量([[1.]]),张量([[-32.]]))偏差[数据,梯度]:(张量([1.]),张量([-10.]))--------------------------------------------------------------------------更新后:--------------------------------------------------------------------------权重 [数据]:张量([[4.1000]])偏差 [数据]:张量([1.9000])--------------------------------------------------------------------------

这里 m = 批量大小 = 2,lr = alpha = 0.1,lambda = weight_decay = 1.

现在考虑张量 weight,它具有 value = 1 和 grad = -32

case1(type1 正则化):

 weight = weight - lr(grad + weight_decay.weight)重量 = 1 - 0.1(-32 + 1(1))重量 = 4.1

case2(type2 正则化):

 weight = weight - lr(grad + (weight_decay/batch size).weight)重量 = 1 - 0.1(-32 + (1/2)(1))重量 = 4.15

输出我们可以看到更新后的权重 = 4.1000.到此结束 PyTorch 使用 type1 正则化.

所以最后在您的代码中,您遵循了 type2 正则化.所以只需将最后几行更改为:

# for k in range(0, len(beta)):# temp_beta[k] = temp_beta[k] + lam * beta[k] #这里的正则化temp_beta= temp_beta/(1.0*n)beta = beta - alpha*(temp_beta + lam * beta)

还有 PyTorch 损失 函数包括正则化项(在优化器中实现),因此也删除了正则化> 自定义成本函数中的术语.

总结:

  1. Pytorch 使用这个正则化函数:

  2. 正则化优化器(weight_decay参数)内部实现.

  3. PyTorch Loss 函数包括正则化项.

  4. 如果使用正则化,
  5. 偏差也会正则化.

  6. 要使用正则化尝试:

    torch.nn.optim.optimiser_name(model.parameters(), lr, weight_decay=lambda).

I have been trying to do L2 regularization on a binary classification model in PyTorch but when I match the results of PyTorch and scratch code it doesn't match, Pytorch code:

class LogisticRegression(nn.Module):
  def __init__(self,n_input_features):
    super(LogisticRegression,self).__init__()
    self.linear=nn.Linear(4,1)
    self.linear.weight.data.fill_(0.0)
    self.linear.bias.data.fill_(0.0)

  def forward(self,x):
    y_predicted=torch.sigmoid(self.linear(x))
    return y_predicted

model=LogisticRegression(4)

criterion=nn.BCELoss()
optimizer=torch.optim.SGD(model.parameters(),lr=0.05,weight_decay=0.1)
dataset=Data()
train_data=DataLoader(dataset=dataset,batch_size=1096,shuffle=False)

num_epochs=1000
for epoch in range(num_epochs):
  for x,y in train_data:
    y_pred=model(x)
    loss=criterion(y_pred,y)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()

Scratch Code:

def sigmoid(z):
    s = 1/(1+ np.exp(-z))
    return s  

def yinfer(X, beta):
  return sigmoid(beta[0] + np.dot(X,beta[1:]))

def cost(X, Y, beta, lam):
    sum = 0
    sum1 = 0
    n = len(beta)
    m = len(Y)
    for i in range(m): 
        sum = sum + Y[i]*(np.log( yinfer(X[i],beta)))+ (1 -Y[i])*np.log(1-yinfer(X[i],beta))
    for i in range(0, n): 
        sum1 = sum1 + beta[i]**2
        
    return  (-sum + (lam/2) * sum1)/(1.0*m)

def pred(X,beta):
  if ( yinfer(X, beta) > 0.5):
    ypred = 1
  else :
    ypred = 0
  return ypred

beta = np.zeros(5)
iterations = 1000
arr_cost = np.zeros((iterations,4))
print(beta)
n = len(Y_train)
for i in range(iterations):
    Y_prediction_train=np.zeros(len(Y_train))
    Y_prediction_test=np.zeros(len(Y_test)) 

    for l in range(len(Y_train)):
        Y_prediction_train[l]=pred(X[l,:],beta)
    
    for l in range(len(Y_test)):
        Y_prediction_test[l]=pred(X_test[l,:],beta)
    
    train_acc = format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)
    test_acc = 100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100   
    arr_cost[i,:] = [i,cost(X,Y_train,beta,lam),train_acc,test_acc]
    temp_beta = np.zeros(len(beta))

    ''' main code from below '''

    for j in range(n): 
        temp_beta[0] = temp_beta[0] + yinfer(X[j,:], beta) - Y_train[j]
        temp_beta[1:] = temp_beta[1:] + (yinfer(X[j,:], beta) - Y_train[j])*X[j,:]
    
    for k in range(0, len(beta)):
        temp_beta[k] = temp_beta[k] +  lam * beta[k]  #regularization here
    
    temp_beta= temp_beta / (1.0*n)
    
    beta = beta - alpha*temp_beta

graph of the losses

graph of training accuracy

graph of testing accuracy

Can someone please tell me why this is happening? L2 value=0.1

解决方案

Great question. I dug a lot through PyTorch documentation and found the answer. The answer is very tricky. Basically there are two ways to calculate regulalarization. (For summery jump to the last section).

The PyTorch uses the first type (in which regularization factor is not divided by batch size).

Here's a sample code which demonstrates that:

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import torch.optim as optim
 
class model(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(1, 1)
        self.linear.weight.data.fill_(1.0)
        self.linear.bias.data.fill_(1.0)

    def forward(self, x):
        return self.linear(x)


model     = model()
optimizer = optim.SGD(model.parameters(), lr=0.1, weight_decay=1.0)

input     = torch.tensor([[2], [4]], dtype=torch.float32)
target    = torch.tensor([[7], [11]], dtype=torch.float32)

optimizer.zero_grad()
pred      = model(input)
loss      = F.mse_loss(pred, target)

print(f'input: {input[0].data, input[1].data}')
print(f'prediction: {pred[0].data, pred[1].data}')
print(f'target: {target[0].data, target[1].data}')

print(f'
MSEloss: {loss.item()}
')

loss.backward()

print('Before updation:')
print('--------------------------------------------------------------------------')
print(f'weight [data, gradient]: {model.linear.weight.data, model.linear.weight.grad}')
print(f'bias [data, gradient]: {model.linear.bias.data, model.linear.bias.grad}')
print('--------------------------------------------------------------------------')
 
optimizer.step()

print('After updation:')
print('--------------------------------------------------------------------------')
print(f'weight [data]: {model.linear.weight.data}')
print(f'bias [data]: {model.linear.bias.data}')
print('--------------------------------------------------------------------------')

which outputs:

input: (tensor([2.]), tensor([4.]))
prediction: (tensor([3.]), tensor([5.]))
target: (tensor([7.]), tensor([11.]))

MSEloss: 26.0

Before updation:
--------------------------------------------------------------------------
weight [data, gradient]: (tensor([[1.]]), tensor([[-32.]]))
bias [data, gradient]: (tensor([1.]), tensor([-10.]))
--------------------------------------------------------------------------
After updation:
--------------------------------------------------------------------------
weight [data]: tensor([[4.1000]])
bias [data]: tensor([1.9000])
--------------------------------------------------------------------------

Here m = batch size = 2, lr = alpha = 0.1, lambda = weight_decay = 1.

Now consider tensor weight which has value = 1 and grad = -32

case1(type1 regularization):

 weight = weight - lr(grad + weight_decay.weight)
 weight = 1 - 0.1(-32 + 1(1))
 weight = 4.1

case2(type2 regularization):

 weight = weight - lr(grad + (weight_decay/batch size).weight)
 weight = 1 - 0.1(-32 + (1/2)(1))
 weight = 4.15

From the output we can see that updated weight = 4.1000. That concludes PyTorch uses type1 regularization.

So finally In your code you are following type2 regularization. So just change some last lines to this:

# for k in range(0, len(beta)):
#    temp_beta[k] = temp_beta[k] +  lam * beta[k]  #regularization here

temp_beta= temp_beta / (1.0*n)

beta = beta - alpha*(temp_beta + lam * beta)

And also PyTorch loss functions doesn't include regularization term(implemented inside optimizers) so also remove regularization terms inside your custom cost function.

In summary:

  1. Pytorch use this Regularization function:

  2. Regularization is implemented inside Optimizers (weight_decay parameter).

  3. PyTorch Loss functions doesn't include Regularization term.

  4. Bias is also regularized if Regularization is used.

  5. To use Regularization try:

    torch.nn.optim.optimiser_name(model.parameters(), lr, weight_decay=lambda).

这篇关于为什么pytorch中的正则化和scratch code不匹配,pytorch中正则化使用的公式是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆