Pytorch 中 NLLLoss 损失函数的 C 类是什么? [英] What are C classes for a NLLLoss loss function in Pytorch?

查看:25
本文介绍了Pytorch 中 NLLLoss 损失函数的 C 类是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在询问关于 NLLLoss 的 C 类 损失函数.

I'm asking about C classes for a NLLLoss loss function.

文档说明:

负对数似然损失.用 C 类训练分类问题很有用.

The negative log likelihood loss. It is useful to train a classification problem with C classes.

基本上在那之后的一切都取决于你是否知道 C 类是什么,我以为我知道 C 类是什么,但文档对我来说没有多大意义.特别是当它描述 (N, C) 其中 C = 类数 的预期输入时.这就是我感到困惑的地方,因为我认为 C 类仅指 输出.我的理解是,C 类是一个热门的分类向量.我经常在教程中发现 NLLLoss 通常与 LogSoftmax 配对来解决分类问题.

Basically everything after that point depends upon you knowing what a C class is, and I thought I knew what a C class was but the documentation doesn't make much sense to me. Especially when it describes the expected inputs of (N, C) where C = number of classes. That's where I'm confused, because I thought a C class refers to the output only. My understanding was that the C class was a one hot vector of classifications. I've often found in tutorials that the NLLLoss was often paired with a LogSoftmax to solve classification problems.

我希望在以下示例中使用 NLLLoss:

I was expecting to use NLLLoss in the following example:

# Some random training data
input = torch.randn(5, requires_grad=True)
print(input)  # tensor([-1.3533, -1.3074, -1.7906,  0.3113,  0.7982], requires_grad=True)
# Build my NN (here it's just a LogSoftmax)
m = nn.LogSoftmax(dim=0)
# Train my NN with the data
output = m(input)
print(output)  # tensor([-2.8079, -2.7619, -3.2451, -1.1432, -0.6564], grad_fn=<LogSoftmaxBackward>)
loss = nn.NLLLoss()
print(loss(output, torch.tensor([1, 0, 0])))

上面在最后一行引发了以下错误:

The above raises the following error on the last line:

ValueError: 预期 2 个或更多维度(得到 1 个)

ValueError: Expected 2 or more dimensions (got 1)

我们可以忽略错误,因为显然我不明白我在做什么.在这里,我将解释我对上述源代码的意图.

We can ignore the error, because clearly I don't understand what I'm doing. Here I'll explain my intentions of the above source code.

input = torch.randn(5, requires_grad=True)

随机一维数组与 [1, 0, 0] 的一个热向量配对进行训练.我正在尝试对一个十进制数的热向量进行二进制位处理.

Random 1D array to pair with one hot vector of [1, 0, 0] for training. I'm trying to do a binary bits to one hot vector of decimal numbers.

m = nn.LogSoftmax(dim=0)

LogSoftmax 的文档说输出将与输入的形状相同,但我只看到了 LogSoftmax(dim=1) 的示例,因此我一直在努力使这项工作成功,因为我找不到相关的例子.

The documentation for LogSoftmax says that the output will be the same shape as the input, but I've only seen examples of LogSoftmax(dim=1) and therefore I've been stuck trying to make this work because I can't find a relative example.

print(loss(output, torch.tensor([1, 0, 0])))

所以现在我有了神经网络的输出,我想知道我的分类[1, 0, 0]的损失.在这个例子中,任何数据是什么并不重要.我只想要一个代表分类的热向量的损失.

So now I have the output of the NN, and I want to know the loss from my classification [1, 0, 0]. It doesn't really matter in this example what any of the data is. I just want a loss for a one hot vector that represents classification.

此时,我在尝试解决与预期输出和输入结构相关的损失函数中的错误时陷入困境.我试过在输出和输入上使用 view(...) 来修复形状,但这只会给我带来其他错误.

At this point I get stuck trying to resolve errors from the loss function relating to expected output and input structures. I've tried using view(...) on the output and input to fix the shape, but that just gets me other errors.

所以这又回到了我最初的问题,我将展示文档中的示例来解释我的困惑:

So this goes back to my original question and I'll show the example from the documentation to explain my confusion:

m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
input = torch.randn(3, 5, requires_grad=True)
train = torch.tensor([1, 0, 4])
print('input', input)  # input tensor([[...],[...],[...]], requires_grad=True)
output = m(input)
print('train', output, train)  # tensor([[...],[...],[...]],grad_fn=<LogSoftmaxBackward>) tensor([1, 0, 4])
x = loss(output, train)

同样,我们在 LogSoftmax 上有 dim=1 这让我现在很困惑,因为看看 input 数据.这是一个 3x5 张量,我迷路了.

Again, we have dim=1 on LogSoftmax which confuses me now, because look at the input data. It's a 3x5 tensor and I'm lost.

这是关于 NLLLoss 函数的第一个输入的文档:

Here's the documentation on the first input for the NLLLoss function:

输入:(N, C)(N,C) 其中 C = 类数

Input: (N, C)(N,C) where C = number of classes

输入按类别数量分组?

那么张量输入的每一都与训练张量的每个元素相关联?

So each row of the tensor input is associated with each element of the training tensor?

如果我改变输入张量的第二维,那么没有任何问题,我不明白发生了什么.

If I change the second dimension of the input tensor, then nothing breaks and I don't understand what is going on.

input = torch.randn(3, 100, requires_grad=True)
# 3 x 100 still works?

所以我不明白这里的 C 类是什么,我认为 C 类是一个分类(如标签)并且仅对 NN 的输出有意义.

So I don't understand what a C class is here, and I thought a C class was a classification (like a label) and meaningful only on the outputs of the NN.

我希望你能理解我的困惑,因为神经网络输入的形状不应该独立于用于分类的一个热向量的形状吗?

I hope you understand my confusion, because shouldn't the shape of the inputs for the NN be independent from the shape of the one hot vector used for classification?

代码示例和文档都说输入的形状是由分类数量定义的,我真的不明白为什么.

Both the code examples and documentations say that the shape of the inputs is defined by the number of classifications, and I don't really understand why.

我曾尝试研究文档和教程以了解我所缺少的内容,但在几天无法通过这一点之后,我决定提出这个问题.这让我很惭愧,因为我认为这将是更容易学习的事情之一.

I have tried to study the documentations and tutorials to understand what I'm missing, but after several days of not being able to get past this point I've decided to ask this question. It's been humbling because I thought this was going to be one of the easier things to learn.

推荐答案

基本上你缺少一个 batch 的概念.

Basically you are missing a concept of batch.

长话短说,损失的每个输入(以及通过网络的输入)都需要 batch 维度(即使用了多少样本).

Long story short, every input to loss (and the one passed through the network) requires batch dimension (i.e. how many samples are used).

分解,一步一步:

每一步都会与每一步进行比较以使其更清晰(顶部的文档,下面的示例)

Each step will be each step compared to make it clearer (documentation on top, your example below)

input = torch.randn(3, 5, requires_grad=True)
input = torch.randn(5, requires_grad=True)

在第一种情况 (docs) 中,创建具有 5 特征的输入并使用 3 示例.在您的情况下,只有 batch 维度(5 个样本),您没有必需的功能.如果您打算拥有一个具有 5 功能的示例,您应该这样做:

In the first case (docs), input with 5 features is created and 3 samples are used. In your case there is only batch dimension (5 samples), you have no features which are required. If you meant to have one sample with 5 features you should do:

input = torch.randn(5, requires_grad=True)

LogSoftmax

LogSoftmax 是跨特征维度完成的,你是跨批次完成的.

LogSoftmax

LogSoftmax is done across features dimension, you are doing it across batch.

m = nn.LogSoftmax(dim=1) # 应用于特征m = nn.LogSoftmax(dim=0) # 批量应用

m = nn.LogSoftmax(dim=1) # apply over features m = nn.LogSoftmax(dim=0) # apply over batch

这个操作通常没有意义,因为样本是相互独立的.

It makes no sense usually for this operation as samples are independent of each other.

由于这是多类分类并且向量中的每个元素代表一个样本,因此可以传递任意数量的数字(只要它小于特征数量,在文档示例的情况下,它是 5,因此 [0-4] 很好).

As this is multiclass classification and each element in vector represents a sample, one can pass as many numbers as one wants (as long as it's smaller than number of features, in case of documentation example it's 5, hence [0-4] is fine ).

train = torch.tensor([1, 0, 4])
train = torch.tensor([1, 0, 0])

我想,您也想将 one-hot 向量作为目标传递.PyTorch 不能那样工作,因为它内存效率低(当您可以精确地确定类时,为什么将所有内容存储为单热编码,在您的情况下它将是 0).

I assume, you wanted to pass one-hot vector as target as well. PyTorch doesn't work that way as it's memory inefficient (why store everything as one-hot encoded when you can just pinpoint exactly the class, in your case it would be 0).

只有神经网络的输出是一个热编码,以便通过所有输出节点反向传播错误,目标不需要.

Only outputs of neural network are one hot encoded in order to backpropagate error through all output nodes, it's not needed for targets.

您不应该使用torch.nn.LogSoftmax 来完成这项任务.只需使用 torch.nn.Linear 作为最后一层并使用 torch.nn.CrossEntropyLoss 与您的目标.

You shouldn't use torch.nn.LogSoftmax at all for this task. Just use torch.nn.Linear as last layer and use torch.nn.CrossEntropyLoss with your targets.

这篇关于Pytorch 中 NLLLoss 损失函数的 C 类是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆