Pytorch中NLLLoss损失函数的C类是什么? [英] What are C classes for a NLLLoss loss function in Pytorch?

查看:981
本文介绍了Pytorch中NLLLoss损失函数的C类是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在问C类的 NLLLoss 损失函数.

I'm asking about C classes for a NLLLoss loss function.

文档指出:

负对数似然损失.用C类训练分类问题很有用.

The negative log likelihood loss. It is useful to train a classification problem with C classes.

从那以后,基本上所有内容都取决于您是否知道C类,并且我以为我知道C类是什么,但是文档对我而言并没有多大意义.特别是当它描述(N, C) where C = number of classes的预期输入时.那是我感到困惑的地方,因为我认为C类仅引用 output .我的理解是C类是分类的一个热门载体.我经常在教程中发现NLLLoss通常与LogSoftmax配对以解决分类问题.

Basically everything after that point depends upon you knowing what a C class is, and I thought I knew what a C class was but the documentation doesn't make much sense to me. Especially when it describes the expected inputs of (N, C) where C = number of classes. That's where I'm confused, because I thought a C class refers to the output only. My understanding was that the C class was a one hot vector of classifications. I've often found in tutorials that the NLLLoss was often paired with a LogSoftmax to solve classification problems.

在以下示例中,我期望使用NLLLoss:

I was expecting to use NLLLoss in the following example:

# Some random training data
input = torch.randn(5, requires_grad=True)
print(input)  # tensor([-1.3533, -1.3074, -1.7906,  0.3113,  0.7982], requires_grad=True)
# Build my NN (here it's just a LogSoftmax)
m = nn.LogSoftmax(dim=0)
# Train my NN with the data
output = m(input)
print(output)  # tensor([-2.8079, -2.7619, -3.2451, -1.1432, -0.6564], grad_fn=<LogSoftmaxBackward>)
loss = nn.NLLLoss()
print(loss(output, torch.tensor([1, 0, 0])))

以上内容在最后一行引发了以下错误:

The above raises the following error on the last line:

ValueError:预期2个或更多尺寸(获得1个)

ValueError: Expected 2 or more dimensions (got 1)

我们可以忽略该错误,因为显然我不明白我在做什么.在这里,我将解释以上源代码的意图.

We can ignore the error, because clearly I don't understand what I'm doing. Here I'll explain my intentions of the above source code.

input = torch.randn(5, requires_grad=True)

随机一维数组与[1, 0, 0]的一个热向量配对以进行训练.我正在尝试对一个十进制数字的热向量做一个二进制位.

Random 1D array to pair with one hot vector of [1, 0, 0] for training. I'm trying to do a binary bits to one hot vector of decimal numbers.

m = nn.LogSoftmax(dim=0)

LogSoftmax的文档说输出将与输入具有相同的形状,但是我只看过LogSoftmax(dim=1)的示例,因此由于无法进行此工作,我一直坚持尝试找到一个相对的例子.

The documentation for LogSoftmax says that the output will be the same shape as the input, but I've only seen examples of LogSoftmax(dim=1) and therefore I've been stuck trying to make this work because I can't find a relative example.

print(loss(output, torch.tensor([1, 0, 0])))

所以现在我有了NN的输出,我想知道分类[1, 0, 0]中的损失.在此示例中,什么数据并不重要.我只想损失一个代表分类的热向量.

So now I have the output of the NN, and I want to know the loss from my classification [1, 0, 0]. It doesn't really matter in this example what any of the data is. I just want a loss for a one hot vector that represents classification.

这时,我陷入了尝试解决损失函数中与预期输出和输入结构有关的错误的困境.我尝试在输出和输入上使用view(...)来修复形状,但这只会给我带来其他错误.

At this point I get stuck trying to resolve errors from the loss function relating to expected output and input structures. I've tried using view(...) on the output and input to fix the shape, but that just gets me other errors.

因此,这回到了我最初的问题,我将展示文档中的示例来解释我的困惑:

So this goes back to my original question and I'll show the example from the documentation to explain my confusion:

m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
input = torch.randn(3, 5, requires_grad=True)
train = torch.tensor([1, 0, 4])
print('input', input)  # input tensor([[...],[...],[...]], requires_grad=True)
output = m(input)
print('train', output, train)  # tensor([[...],[...],[...]],grad_fn=<LogSoftmaxBackward>) tensor([1, 0, 4])
x = loss(output, train)

同样,我们现在在LogSoftmax上有dim=1,这使我感到困惑,因为请查看input数据.这是一个3x5张量,我迷路了.

Again, we have dim=1 on LogSoftmax which confuses me now, because look at the input data. It's a 3x5 tensor and I'm lost.

这是NLLLoss函数的第一个输入的文档:

Here's the documentation on the first input for the NLLLoss function:

输入:(N,C)(N,C),其中C =类数

Input: (N, C)(N,C) where C = number of classes

输入是按类数分组的吗?

因此,张量输入的每个都与训练张量的每个 关联?

So each row of the tensor input is associated with each element of the training tensor?

如果我改变输入张量的第二维,那么什么都不会中断,而且我不知道发生了什么.

If I change the second dimension of the input tensor, then nothing breaks and I don't understand what is going on.

input = torch.randn(3, 100, requires_grad=True)
# 3 x 100 still works?

所以我不明白这里的C类是什么,我认为C类是一个分类(如标签),仅对NN的输出有意义.

So I don't understand what a C class is here, and I thought a C class was a classification (like a label) and meaningful only on the outputs of the NN.

我希望您能理解我的困惑,因为神经网络的输入形状不应该与用于分类的一个热矢量的形状无关吗?

I hope you understand my confusion, because shouldn't the shape of the inputs for the NN be independent from the shape of the one hot vector used for classification?

两个代码示例和文档都说输入的形状是由分类的数量定义的,我真的不明白为什么.

Both the code examples and documentations say that the shape of the inputs is defined by the number of classifications, and I don't really understand why.

我试图研究文档和教程以了解我所缺少的内容,但是在无法克服这一点几天后,我决定问这个问题.一直很谦卑,因为我认为这将是更容易学习的东西之一.

I have tried to study the documentations and tutorials to understand what I'm missing, but after several days of not being able to get past this point I've decided to ask this question. It's been humbling because I thought this was going to be one of the easier things to learn.

推荐答案

基本上,您缺少batch的概念.

Basically you are missing a concept of batch.

长话短说,每一项损失(以及通过网络传递的损失)的输入都需要batch维度(即使用了多少个样本).

Long story short, every input to loss (and the one passed through the network) requires batch dimension (i.e. how many samples are used).

逐步分解:

将每一步与每个步骤进行比较以使其更加清晰(文档位于顶部,示例如下)

Each step will be each step compared to make it clearer (documentation on top, your example below)

input = torch.randn(3, 5, requires_grad=True)
input = torch.randn(5, requires_grad=True)

在第一种情况下(docs),将创建具有5功能的输入,并使用3样本.就您而言,只有batch维(5个样本),您没有必需的功能.如果您打算使用一个具有5功能的示例,则应该执行以下操作:

In the first case (docs), input with 5 features is created and 3 samples are used. In your case there is only batch dimension (5 samples), you have no features which are required. If you meant to have one sample with 5 features you should do:

input = torch.randn(5, requires_grad=True)

LogSoftmax

LogSoftmax是跨要素维度完成的,您是跨批次进行的.

LogSoftmax

LogSoftmax is done across features dimension, you are doing it across batch.

m = nn.LogSoftmax(dim = 1)#套用特征 m = nn.LogSoftmax(dim = 0)#批量应用

m = nn.LogSoftmax(dim=1) # apply over features m = nn.LogSoftmax(dim=0) # apply over batch

此操作通常没有意义,因为样本彼此独立.

It makes no sense usually for this operation as samples are independent of each other.

由于这是多类分类,向量中的每个元素都代表一个样本,因此可以传递任意数量的数字(只要它小于要素数量,在文档示例中为5,因此为[0-4] >可以).

As this is multiclass classification and each element in vector represents a sample, one can pass as many numbers as one wants (as long as it's smaller than number of features, in case of documentation example it's 5, hence [0-4] is fine ).

train = torch.tensor([1, 0, 4])
train = torch.tensor([1, 0, 0])

我认为,您也希望将一热点向量作为目标传递. PyTorch无法以这种方式工作,因为它内存效率低(为什么当您只能精确定位类时,为什么将所有内容存储为一键编码,在这种情况下为0).

I assume, you wanted to pass one-hot vector as target as well. PyTorch doesn't work that way as it's memory inefficient (why store everything as one-hot encoded when you can just pinpoint exactly the class, in your case it would be 0).

仅神经网络的输出是一种热编码的编码,以便在所有输出节点上反向传播错误,目标不需要它.

Only outputs of neural network are one hot encoded in order to backpropagate error through all output nodes, it's not needed for targets.

您不应完全使用torch.nn.LogSoftmax 来完成此任务.只需使用torch.nn.Linear作为最后一层,然后将torch.nn.CrossEntropyLoss用于目标即可.

You shouldn't use torch.nn.LogSoftmax at all for this task. Just use torch.nn.Linear as last layer and use torch.nn.CrossEntropyLoss with your targets.

这篇关于Pytorch中NLLLoss损失函数的C类是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆