pytorch中的LSTM/RNN转发方法与训练模型之间的关系 [英] LSTM/RNN in pytorch The relation between forward method and training model

查看:44
本文介绍了pytorch中的LSTM/RNN转发方法与训练模型之间的关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对神经网络还是很陌生,因此,对于以下内容的任何歧义,敬请原谅.

I'm still fairly new to neural networks, so sorry on beforehand for any ambiguities to the following.

在标准"中,对于语言任务的LSTM实现,我们有以下内容(很粗略的草图很抱歉):

In a "standard" LSTM implementation for language task, we have the following (sorry for the very rough sketches):

class LSTM(nn.Module):
    def __init__(*args):
    ...

    def forward(self, input, states):
         
        lstn_in = self.model['embed'](input)
        lstm_out, hidden = self.model['lstm'](lstm_in,states)

        return lstm_out, hidden

稍后,我们在训练步骤中调用此模型:

Later on, we call upon this model in the training step:

def train(*args):
      
    for epoch in range(epochs):
        ....
        *init_zero_states
        ...
        out, states = model(input, states)
        ...
    return model

让我们说,我输入了3个句子:

Let's just say, that I have 3 sentences as input:

sents = [[The, sun, is, shiny],
 [The, beach, was, very, windy],
 [Computer, broke, down, today]]

model = train(LSTM, sents)

所有句子中的所有单词都将转换为嵌入内容并加载到模型中.

All words in all sentences gets converted to embeddings and loaded into the model.

现在的问题:

  1. self.model ['lstm']是否遍历所有文章中的所有单词,并在每个单词之后输出一个?或每个句子?

  1. Does the self.model['lstm'] iterate though all words from all articles and makes one output after every word? or every sentence?

模型如何区分3个句子,例如在获得"The","sun","is","shiny","lstm"重设并重新开始吗?

How does the model make distinction between the 3 sentences, such as after getting "The", "sun", "is", "shiny", does something (such as the states) in the 'lstm' reset and begin anew?

出"在退出之后的训练步骤中,states = model(input,states)是运行完所有3个句子之后的输出,因此组合了"information".全部三个句子中的全部?

The "out" in the train step after out, states = model(input, states) is the output after running all 3 sentences and hence the combined "information" from all 3 sentences?

谢谢!

推荐答案

在Pytorch中使用LSTM时,通常使用 nn.LSTM 函数.这是一个简单的示例,然后解释了内部发生的情况:

when using LSTMs in Pytorch you usually use the nn.LSTM function. Here is a quick example and then an explanation what happens inside:

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        
        self.embedder = nn.Embedding(voab_size, embed_size)

        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.embedder(x)
        
        # every time you pass a new sentence into the model you need to create
        # a new hidden-state (the LSTM requires, unlike RNNs, two hidden-states in a tuple)

        hidden = (torch.zeros(num_layers, batch_size, hidden_size), torch.zeros(num_layers, batch_size, hidden_size))
        x, hidden = self.lstm(x, hidden)
        
        # x contains the output states of every timestep, 
        # for classifiction we mostly just want the last one
        x = x[:, -1]

        x = self.fc(x)
        x = self.softmax(x)
        return x

因此,在查看 nn.LSTM 函数时,您会看到所有N个嵌入式单词都被立即传递给它,并且得到了所有N个输出(每个时间步长一个)作为输出.这意味着在lstm函数内部,迭代句子嵌入中的所有单词.我们只是在代码中看不到.它还返回每个时间步的hiddenstate,但是您不必进一步使用它.在大多数情况下,您可以忽略它.

So, when taking a look at the nn.LSTM function, you see all N embedded words are passed into it at once and you get as output all N outputs (one from every timestep). That means inside of the lstm function, it iterates over all words in the sentence embeddings. We just dont see that in the code. It also returns the hiddenstate of every timestep but you dont have to use that further. In most cases you can just ignore that.

作为伪代码:

def lstm(x):
    hiddenstates = init_with_zeros()
    outputs, hiddenstates = [], []
    for e in x:
        output, hiddenstate = neuralnet(e, hiddenstate)
    
        outputs.append(output)
        hiddenstates.append(hiddenstate)

    return outputs, hiddenstates

sentence = ["the", "sun", "is", "shiny"]
sentence = embedding(sentence)

outputs, hiddenstates = lstm(sentence)

这篇关于pytorch中的LSTM/RNN转发方法与训练模型之间的关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆