理解有状态 LSTM [英] Understanding stateful LSTM

查看:18
本文介绍了理解有状态 LSTM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习这个教程在 RNN/LSTM 上,我很难理解有状态的 LSTM.我的问题如下:

I'm going through this tutorial on RNNs/LSTMs and I'm having quite a hard time understanding stateful LSTMs. My questions are as follows :

在关于RNNs的Keras文档中,我发现i-th 位置将作为下一批中 i-th 位置样本的输入隐藏状态.这是否意味着如果我们想将隐藏状态从一个样本传递到另一个样本,我们必须使用大小为 1 的批次并因此执行在线梯度下降?有没有办法在大小 >1 的批次中传递隐藏状态并对该批次执行梯度下降?

In the Keras docs on RNNs, I found out that the hidden state of the sample in i-th position within the batch will be fed as input hidden state for the sample in i-th position in the next batch. Does that mean that if we want to pass the hidden state from sample to sample we have to use batches of size 1 and therefore perform online gradient descent? Is there a way to pass the hidden state within a batch of size >1 and perform gradient descent on that batch ?

在教程的段落用于单字符到单字符映射的状态 LSTM"中,给出了使用 batch_size = 1stateful = True 学习的代码给定一个字母,预测下一个字母.在代码的最后部分(完整代码末尾的第 53 行),模型以随机字母 ('K') 开始进行测试并预测 'B' 然后给定 'B' 它预测 'C' 等. 除了K",它似乎运行良好.但是,我尝试对代码进行以下调整(最后一部分也是如此,我保留了第 52 行及以上):

In the tutorial's paragraph 'Stateful LSTM for a One-Char to One-Char Mapping' were given a code that uses batch_size = 1 and stateful = True to learn to predict the next letter of the alphabet given a letter of the alphabet. In the last part of the code (line 53 to the end of the complete code), the model is tested starting with a random letter ('K') and predicts 'B' then given 'B' it predicts 'C', etc. It seems to work well except for 'K'. However, I tried the following tweak to the code (last part too, I kept lines 52 and above):

    # demonstrate a random starting point
    letter1 = "M"
    seed1 = [char_to_int[letter1]]
    x = numpy.reshape(seed, (1, len(seed), 1))
    x = x / float(len(alphabet))
    prediction = model.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    print(int_to_char[seed1[0]], "->", int_to_char[index])
    letter2 = "E"
    seed2 = [char_to_int[letter2]]
    seed = seed2
    print("New start: ", letter1, letter2)
    for i in range(0, 5):
        x = numpy.reshape(seed, (1, len(seed), 1))
        x = x / float(len(alphabet))
        prediction = model.predict(x, verbose=0)
        index = numpy.argmax(prediction)
        print(int_to_char[seed[0]], "->", int_to_char[index])
        seed = [index]
    model.reset_states()

and these outputs:

    M -> B
    New start: M E
    E -> C
    C -> D
    D -> E
    E -> F

It looks like the LSTM did not learn the alphabet but just the positions of the letters, and that regardless of the first letter we feed in, the LSTM will always predict B since it's the second letter, then C and so on.

因此,假设在测试期间,如果我们以字母K"开头,那么将之前的隐藏状态作为当前隐藏状态的初始隐藏状态如何帮助我们学习,字母 A 到 J 将不会被之前输入并且初始隐藏状态不会与训练期间相同?

Therefore, how does keeping the previous hidden state as initial hidden state for the current hidden state help us with the learning given that during test if we start with the letter 'K' for example, letters A to J will not have been fed in before and the initial hidden state won't be the same as during training ?

我想在一整本书上训练我的 LSTM 来学习如何生成句子,也许还可以学习作者的风格,我如何自然地在该文本上训练我的 LSTM(输入整个文本并让 LSTM 找出依赖关系在单词之间)而不是必须自己人为地"从那本书中创建一批句子来训练我的 LSTM?我相信我应该使用有状态的 LSTM 会有所帮助,但我不确定如何.

I want to train my LSTM on a whole book to learn how to generate sentences and perhaps learn the authors style too, how can I naturally train my LSTM on that text (input the whole text and let the LSTM figure out the dependencies between the words) instead of having to 'artificially' create batches of sentences from that book myself to train my LSTM on? I believe I should use stateful LSTMs could help but I'm not sure how.

推荐答案

  1. 在 Keras 中拥有有状态的 LSTM 意味着将使用 Keras 变量来存储和更新状态,实际上您可以随时检查状态向量的值(即,直到您调用 <代码>reset_states()).另一方面,无状态模型每次处理批处理时都会使用初始零状态,因此就好像您总是在 train_on_batch 之后调用 reset_states()>、test_on_batchpredict_on_batch.关于在有状态模型上的下一批重用状态的解释就是与非有状态的区别;当然,状态将始终在批处理中的每个序列中流动,并且您不需要 需要有大小为 1 的批处理才能发生这种情况.我看到有状态模型很有用的两种情况:
  1. Having a stateful LSTM in Keras means that a Keras variable will be used to store and update the state, and in fact you could check the value of the state vector(s) at any time (that is, until you call reset_states()). A non-stateful model, on the other hand, will use an initial zero state every time it processes a batch, so it is as if you always called reset_states() after train_on_batch, test_on_batch and predict_on_batch. The explanation about the state being reused for the next batch on stateful models is just about that difference with non-stateful; of course the state will always flow within each sequence in the batch and you do not need to have batches of size 1 for that to happen. I see two scenarios where stateful models are useful:

  • 您想对分割数据序列进行训练,因为这些数据序列非常长,而且对它们的整个长度进行训练是不切实际的.
  • 在预测时间,您希望检索序列中每个时间点的输出,而不仅仅是在最后(因为您想将其反馈回网络或因为您的应用程序需要它).我个人在导出用于以后集成的模型中这样做(这些模型是批量大小为 1 的训练模型的副本").
    1. 我同意字母表的 RNN 示例在实践中似乎并不是很有用;它仅在您以字母 A 开头时才有效.如果您想学习从任何字母开始重现字母表,您需要使用此类示例(字母表的子序列或旋转)训练网络.但是我认为常规的前馈网络可以学习预测字母表中的下一个字母训练对(A,B),(B,C)等.

    1. I agree that the example of an RNN for the alphabet does not really seem very useful in practice; it will only work when you start with the letter A. If you want to learn to reproduce the alphabet starting at any letter, you would need to train the network with that kind of examples (subsequences or rotations of the alphabet). But I think a regular feed-forward network could learn to predict the next letter of the alphabet training on pairs like (A, B), (B, C), etc. I think the example is meant for demonstrative purposes more than anything else.

    您可能已经读过它,但流行的帖子 The Unreasonable Effectiveness of Recurrent Neural Networks 显示了一些与您想要做的事情相关的有趣结果(尽管它并没有真正深入到实现细节).我没有使用文本数据训练 RNN 的个人经验,但您可以研究多种方法.您可以构建基于字符的模型(如帖子中的模型),您可以一次输入和接收一个字符.一种更高级的方法是对文本进行一些预处理并将它们转换为数字序列;Keras 包含一些文本预处理函数来做到这一点.将单个数字作为特征空间可能不会很好地工作,因此您可以简单地将每个单词转换为具有 one-hot 编码的向量,或者更有趣的是,让网络为每个词学习最佳向量表示,即就是他们所说的 embedding.您可以进一步进行预处理并查看诸如 NLTK 之类的内容,特别是如果您想删除停用词,标点符号之类的.最后,如果您有不同大小的序列(例如,您使用的是全文而不是固定大小的摘录,这对您来说可能重要也可能不重要)您需要更加小心并使用 掩码 和/或 样本权重.根据具体的问题,您可以相应地设置训练.如果你想学习生成相似的文本,Y"将类似于X"(one-hot 编码),仅移动一个(或多个)位置(在这种情况下,您可能需要使用 return_sequences=TrueTimeDistributed 层).如果你想确定 autor,你的输出可以是 softmax Dense layer.>

    You may have probably already read it, but the popular post The Unreasonable Effectiveness of Recurrent Neural Networks shows some interesting results along the lines of what you want to do (although it does not really dive into implementation specifics). I don't have personal experience training RNN with textual data, but there is a number of approaches you can research. You can build character-based models (like the ones in the post), where your input and receive one character at a time. A more advanced approach is to do some preprocessing on the texts and transform them into sequences of numbers; Keras includes some text preprocessing functions to do that. Having one single number as feature space is probably not going to work all that well, so you could simply turn each word into a vector with one-hot encoding or, more interestingly, have the network learn the best vector representation for each for, which is what they call en embedding. You can go even further with the preprocessing and look into something like NLTK, specially if you want to remove stop words, punctuation and things like that. Finally, if you have sequences of different sizes (e.g. you are using full texts instead of excerpts of a fixed size, which may or may not be important for you) you will need to be a bit more careful and use masking and/or sample weighting. Depending on the exact problem, you can set up the training accordingly. If you want to learn to generate similar text, the "Y" would be the similar to the "X" (one-hot encoded), only shifted by one (or more) positions (in this case you may need to use return_sequences=True and TimeDistributed layers). If you want to determine the autor, your output could be a softmax Dense layer.

    希望有所帮助.

    这篇关于理解有状态 LSTM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆