PyTorch 中的标签平滑 [英] Label Smoothing in PyTorch

查看:15
本文介绍了PyTorch 中的标签平滑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用迁移学习为 Stanford Cars 数据集构建 ResNet-18 分类模型.我想实施标签平滑来惩罚过度自信的预测并提高泛化能力.

I'm building a ResNet-18 classification model for the Stanford Cars dataset using transfer learning. I would like to implement label smoothing to penalize overconfident predictions and improve generalization.

TensorFlow交叉熵损失.有没有人为 PyTorch 构建了一个类似的功能,我可以即插即用?

TensorFlow has a simple keyword argument in CrossEntropyLoss. Has anyone built a similar function for PyTorch that I could plug-and-play with?

推荐答案

使用软目标通常可以显着提高多类神经网络的泛化和学习速度.硬目标和标签上的均匀分布的>加权平均.以这种方式平滑标签可防止网络变得过度自信,并且标签平滑已用于许多最先进的模型,包括图像分类、语言翻译和语音识别.

The generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels in this way prevents the network from becoming over-confident and label smoothing has been used in many state-of-the-art models, including image classification, language translation, and speech recognition.

标签平滑已经在 Tensorflow 的交叉熵损失函数中实现了.BinaryCrossentropyCategoricalCrossentropy.但是目前,PyTorch 中没有官方的标签平滑实现.但是,正在对此进行积极讨论,并希望将提供官方软件包.这是讨论主题:问题 #7455.

Label Smoothing is already implemented in Tensorflow within the cross-entropy loss functions. BinaryCrossentropy, CategoricalCrossentropy. But currently, there is no official implementation of Label Smoothing in PyTorch. However, there is going an active discussion on it and hopefully, it will be provided with an official package. Here is that discussion thread: Issue #7455.

在这里,我们将带来一些来自 PyTorch 从业者的 标签平滑 (LS) 的最佳实现.基本上,有很多方法可以实现LS.请参考这个具体的讨论,一个是这里,和另一个在这里.在这里,我们将以 2 独特的方式实现实现,每种方式有两个版本;总共4.

Here We will bring some available best implementation of Label Smoothing (LS) from PyTorch practitioner. Basically, there are many ways to implement the LS. Please refer to this specific discussion on this, one is here, and another here. Here we will bring implementation in 2 unique ways with two versions of each; so total 4.

通过这种方式,它接受了one-hot 目标向量.用户必须手动平滑他们的目标向量.它可以在 with torch.no_grad() 范围内完成,因为它会暂时将所有 requires_grad 标志设置为 false.

In this way, it accepts the one-hot target vector. The user must manually smooth their target vector. And it can be done within with torch.no_grad() scope, as it temporarily sets all of the requires_grad flags to false.

  1. 杨德文:来源

import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.modules.loss import _WeightedLoss


class LabelSmoothingLoss(nn.Module):
    def __init__(self, classes, smoothing=0.0, dim=-1, weight = None):
        """if smoothing == 0, it's one-hot method
           if 0 < smoothing < 1, it's smooth method
        """
        super(LabelSmoothingLoss, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
        self.weight = weight
        self.cls = classes
        self.dim = dim

    def forward(self, pred, target):
        assert 0 <= self.smoothing < 1
        pred = pred.log_softmax(dim=self.dim)

        if self.weight is not None:
            pred = pred * self.weight.unsqueeze(0)   

        with torch.no_grad():
            true_dist = torch.zeros_like(pred)
            true_dist.fill_(self.smoothing / (self.cls - 1))
            true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
        return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))

此外,我们在 self 上添加了一个断言复选标记.平滑并为此实现添加了损失加权支持.

Additionally, we've added an assertion checkmark on self. smoothing and added loss weighting support on this implementation.

  1. Shital Shah:来源

Shital 已经在这里发布了答案.这里我们要指出这个实现类似于 Devin Yang上面的实现.然而,在这里我们提到他的代码时尽量减少了一点代码语法.

Shital already posted the answer here. Here we're pointing out that this implementation is similar to Devin Yang's above implementation. However, here we're mentioning his code with minimizing a bit of code syntax.

class SmoothCrossEntropyLoss(_WeightedLoss):
    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing
        self.weight = weight
        self.reduction = reduction

    def k_one_hot(self, targets:torch.Tensor, n_classes:int, smoothing=0.0):
        with torch.no_grad():
            targets = torch.empty(size=(targets.size(0), n_classes),
                                  device=targets.device) 
                                  .fill_(smoothing /(n_classes-1)) 
                                  .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)
        return targets

    def reduce_loss(self, loss):
        return loss.mean() if self.reduction == 'mean' else loss.sum() 
        if self.reduction == 'sum' else loss

    def forward(self, inputs, targets):
        assert 0 <= self.smoothing < 1

        targets = self.k_one_hot(targets, inputs.size(-1), self.smoothing)
        log_preds = F.log_softmax(inputs, -1)

        if self.weight is not None:
            log_preds = log_preds * self.weight.unsqueeze(0)

        return self.reduce_loss(-(targets * log_preds).sum(dim=-1))

检查

import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.modules.loss import _WeightedLoss


if __name__=="__main__":
    # 1. Devin Yang
    crit = LabelSmoothingLoss(classes=5, smoothing=0.5)
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1], 
                                 [1, 0.2, 0.7, 0.9, 1]])
    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

    # 2. Shital Shah
    crit = SmoothCrossEntropyLoss(smoothing=0.5)
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1], 
                                 [1, 0.2, 0.7, 0.9, 1]])
    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

tensor(1.4178)
tensor(1.4178)


选项 2:LabelSmoothingCrossEntropyLoss

通过这种方式,它接受目标向量并使用不手动平滑目标向量,而是由内置模块处理标签平滑.它允许我们根据 F.nll_loss.

(一).Wangleiofficial:来源 - (AFAIK),原始海报

(a). Wangleiofficial: Source - (AFAIK), Original Poster

(b).数据龙:来源 - 添加了加权支持

(b). Datasaurus: Source - Added Weighting Support

此外,我们略微减少了编码编写,使其更加简洁.

Further, we slightly minimize the coding write-up to make it more concise.

class LabelSmoothingLoss(torch.nn.Module):
    def __init__(self, smoothing: float = 0.1, 
                 reduction="mean", weight=None):
        super(LabelSmoothingLoss, self).__init__()
        self.smoothing   = smoothing
        self.reduction = reduction
        self.weight    = weight

    def reduce_loss(self, loss):
        return loss.mean() if self.reduction == 'mean' else loss.sum() 
         if self.reduction == 'sum' else loss

    def linear_combination(self, x, y):
        return self.smoothing * x + (1 - self.smoothing) * y

    def forward(self, preds, target):
        assert 0 <= self.smoothing < 1

        if self.weight is not None:
            self.weight = self.weight.to(preds.device)

        n = preds.size(-1)
        log_preds = F.log_softmax(preds, dim=-1)
        loss = self.reduce_loss(-log_preds.sum(dim=-1))
        nll = F.nll_loss(
            log_preds, target, reduction=self.reduction, weight=self.weight
        )
        return self.linear_combination(loss / n, nll)

  1. NVIDIA/DeepLearningExamples:来源

class LabelSmoothing(nn.Module):
    """NLL loss with label smoothing.
    """
    def __init__(self, smoothing=0.0):
        """Constructor for the LabelSmoothing module.
        :param smoothing: label smoothing factor
        """
        super(LabelSmoothing, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing

    def forward(self, x, target):
        logprobs = torch.nn.functional.log_softmax(x, dim=-1)
        nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
        nll_loss = nll_loss.squeeze(1)
        smooth_loss = -logprobs.mean(dim=-1)
        loss = self.confidence * nll_loss + self.smoothing * smooth_loss
        return loss.mean()

检查

if __name__=="__main__":
    # Wangleiofficial
    crit = LabelSmoothingLoss(smoothing=0.3, reduction="mean")
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1], 
                                 [1, 0.2, 0.7, 0.9, 1]])

    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

    # NVIDIA
    crit = LabelSmoothing(smoothing=0.3)
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1], 
                                 [1, 0.2, 0.7, 0.9, 1]])
    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

tensor(1.3883)
tensor(1.3883)


更新:正式添加

torch.nn.CrossEntropyLoss(weight=None, size_average=None, 
                          ignore_index=- 100, reduce=None, 
                          reduction='mean', label_smoothing=0.0)

这篇关于PyTorch 中的标签平滑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆