如何在 pytorch 中使用 LSTM 进行分类? [英] How can I use LSTM in pytorch for classification?

查看:52
本文介绍了如何在 pytorch 中使用 LSTM 进行分类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的代码如下:

class Mymodel(nn.Module):def __init__(self, input_size, hidden_​​size, output_size, num_layers, batch_size):超级(鉴别器,自我).__init__()self.input_size = input_sizeself.hidden_​​size = hidden_​​sizeself.output_size = output_sizeself.num_layers = num_layersself.batch_size = batch_sizeself.lstm = nn.LSTM(input_size, hidden_​​size)self.proj = nn.Linear(hidden_​​size, output_size)self.hidden = self.init_hidden()定义 init_hidden(self):返回(变量(torch.zeros(self.num_layers,self.batch_size,self.hidden_​​size)),变量(torch.zeros(self.num_layers,self.batch_size,self.hidden_​​size)))def forward(self, x):lstm_out, self.hidden = self.lstm(x, self.hidden)输出 = self.proj(lstm_out)结果 = F.sigmoid(输出)返回结果

我想使用 LSTM 将句子分类为好 (1) 或坏 (0).使用这段代码,我得到的结果是 time_step * batch_size * 1 但不是 0 或 1.如何编辑代码以获得分类结果?

解决方案

理论:

回想一下,LSTM 为系列中的每个输入输出一个向量.您正在使用句子,它们是一系列单词(可能转换为索引,然后作为向量嵌入).来自 LSTM

当您逐字输入句子 (x_i-by-x_i+1) 时,您会得到每个时间步长的输出.您想解释整个句子以对其进行分类.所以你必须等到 LSTM 看到所有的单词.也就是说,您需要取 h_t 其中 t 是句子中的单词数.

代码:

这是一个编码参考.我不会复制粘贴整个内容,只会复制粘贴相关部分.神奇发生在 self.hidden2label(lstm_out[-1])

class LSTMClassifier(nn.Module):def __init__(self, embedding_dim, hidden_​​dim, vocab_size, label_size, batch_size):...self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)self.lstm = nn.LSTM(embedding_dim, hidden_​​dim)self.hidden2label = nn.Linear(hidden_​​dim, label_size)self.hidden = self.init_hidden()定义 init_hidden(self):返回(autograd.Variable(torch.zeros(1,self.batch_size,self.hidden_​​dim)),autograd.Variable(torch.zeros(1, self.batch_size, self.hidden_​​dim)))def forward(自我,句子):embeds = self.word_embeddings(句子)x = embeds.view(len(sentence), self.batch_size, -1)lstm_out, self.hidden = self.lstm(x, self.hidden)y = self.hidden2label(lstm_out[-1])log_probs = F.log_softmax(y)返回 log_probs

My code is as below:

class Mymodel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers, batch_size):
        super(Discriminator, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers
        self.batch_size = batch_size

        self.lstm = nn.LSTM(input_size, hidden_size)
        self.proj = nn.Linear(hidden_size, output_size)
        self.hidden = self.init_hidden()


    def init_hidden(self):
        return (Variable(torch.zeros(self.num_layers, self.batch_size, self.hidden_size)),
                Variable(torch.zeros(self.num_layers, self.batch_size, self.hidden_size)))

    def forward(self, x):
        lstm_out, self.hidden = self.lstm(x, self.hidden)
        output = self.proj(lstm_out)
        result = F.sigmoid(output)
        return result

I want to use LSTM to classify a sentence to good (1) or bad (0). Using this code, I get the result which is time_step * batch_size * 1 but not 0 or 1. How to edit the code in order to get the classification result?

解决方案

Theory:

Recall that an LSTM outputs a vector for every input in the series. You are using sentences, which are a series of words (probably converted to indices and then embedded as vectors). This code from the LSTM PyTorch tutorial makes clear exactly what I mean (***emphasis mine):

lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
inputs = [autograd.Variable(torch.randn((1, 3)))
          for _ in range(5)]  # make a sequence of length 5

# initialize the hidden state.
hidden = (autograd.Variable(torch.randn(1, 1, 3)),
          autograd.Variable(torch.randn((1, 1, 3))))
for i in inputs:
    # Step through the sequence one element at a time.
    # after each step, hidden contains the hidden state.
    out, hidden = lstm(i.view(1, 1, -1), hidden)

# alternatively, we can do the entire sequence all at once.
# the first value returned by LSTM is all of the hidden states throughout
# the sequence. the second is just the most recent hidden state
# *** (compare the last slice of "out" with "hidden" below, they are the same)
# The reason for this is that:
# "out" will give you access to all hidden states in the sequence
# "hidden" will allow you to continue the sequence and backpropagate,
# by passing it as an argument  to the lstm at a later time
# Add the extra 2nd dimension
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = (autograd.Variable(torch.randn(1, 1, 3)), autograd.Variable(
torch.randn((1, 1, 3))))  # clean out hidden state
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)

One more time: compare the last slice of "out" with "hidden" below, they are the same. Why? Well...

If you're familiar with LSTM's, I'd recommend the PyTorch LSTM docs at this point. Under the output section, notice h_t is output at every t.

Now if you aren't used to LSTM-style equations, take a look at Chris Olah's LSTM blog post. Scroll down to the diagram of the unrolled network:

As you feed your sentence in word-by-word (x_i-by-x_i+1), you get an output from each timestep. You want to interpret the entire sentence to classify it. So you must wait until the LSTM has seen all the words. That is, you need to take h_t where t is the number of words in your sentence.

Code:

Here's a coding reference. I'm not going to copy-paste the entire thing, just the relevant parts. The magic happens at self.hidden2label(lstm_out[-1])

class LSTMClassifier(nn.Module):

    def __init__(self, embedding_dim, hidden_dim, vocab_size, label_size, batch_size):
        ...
        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim)
        self.hidden2label = nn.Linear(hidden_dim, label_size)
        self.hidden = self.init_hidden()

    def init_hidden(self):
        return (autograd.Variable(torch.zeros(1, self.batch_size, self.hidden_dim)),
                autograd.Variable(torch.zeros(1, self.batch_size, self.hidden_dim)))

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence)
        x = embeds.view(len(sentence), self.batch_size , -1)
        lstm_out, self.hidden = self.lstm(x, self.hidden)
        y  = self.hidden2label(lstm_out[-1])
        log_probs = F.log_softmax(y)
        return log_probs

这篇关于如何在 pytorch 中使用 LSTM 进行分类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆