对“了解Keras LSTM"的怀疑 [英] Doubts regarding `Understanding Keras LSTMs`

查看:54
本文介绍了对“了解Keras LSTM"的怀疑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是LSTM的新手,并经历了了解Keras LSTM ,并了解了一些愚蠢的疑问与 Daniel Moller 的漂亮答案有关.

这是我的一些疑问:

  1. Achieving one to many节中指定了2种方式,我们可以使用stateful=True循环获取一个步骤的输出并将其用作下一步的输入(需要output_features == input_features).

    One to many with repeat vector图中,重复的向量在所有时间步中都作为输入被馈送,而在One to many with stateful=True中,输出在下一个时间步中作为输入被馈送.因此,难道我们不使用stateful=True更改图层的工作方式吗?

    在构建 RNN 时,应遵循上述两种方法中的哪一种(使用重复矢量或将先前的时间步输出作为下一个输入)?

  2. One to many with stateful=True部分下,为了更改one to many的行为,在用于预测的手动循环代码中,我们将如何知道steps_to_predict变量,因为我们不知道输出序列长度

    我也不了解整个模型使用last_step output生成next_step ouput的方式.它使我对model.predict()函数的工作感到困惑.我的意思是,不是同时model.predict()同时预测整个输出序列,而不是遍历要生成的no. of output sequences(其值我仍然不知道)并执行model.predict()以预测特定的时间步长在给定的迭代中输出?

  3. 我无法理解整个Many to many案例.任何其他链接都将有所帮助.

  4. 我了解我们使用model.reset_states()来确保新批次独立于先前批次.但是,我们是否手动创建序列的批处理,以使一个批处理跟随另一批处理?还是stateful=True模式下的Keras自动将序列分成这些批处理.

    如果这是手动完成的,为什么有人将数据集划分为这样的批处理,即序列的一部分在一个批处理中而另一序列在下一个批处理中?

  5. 最后,将使用stateful=True的实际实现或示例/用例是什么(因为这似乎很不寻常)?我正在学习LSTM,这是我第一次在Keras中认识stateful.

有人可以帮助我解释我的愚蠢问题,以便使我对在Keras中实施LSTM有所了解吗?

请其中一些人澄清当前答案,并提出一些剩余疑问

A .因此,基本上有状态让我们在每批处理后keep OR reset内部状态.然后,如果我们在每次训练批次后继续不断地重置内部状态,该模型将如何学习?重置是否真的意味着要重置参数(用于计算隐藏状态)?

B .在If stateful=False: automatically resets inner state, resets last output step行中.重置最后一个输出步骤是什么意思?我的意思是,如果每个时间步都产生自己的输出,那么重置最后一个输出步意味着什么,那也只有最后一个?

C .响应Question 2Question 4的第二点,我仍然没有得到您的manipulate the batches between each iterationstateful的需要((Question 2的最后一行)仅重置状态).我要指出的是,我们不知道某个时间步中生成的每个输出的输入.

因此,您将序列分解为only one-step的序列,然后使用new_step = model.predict(last_step),但是随后又如何知道需要一次又一次地执行此操作(循环必须有一个停止点) ?另外,请说明stateful部分(在Question 2的最后一行中).

D .在One to many with stateful=True下的代码中,似乎for循环(手动循环)用于预测下一个单词仅用于测试时间.该模型是在火车时刻将其本身包含在内吗?还是我们manually在火车时刻也需要使用此循环?

E .假设我们正在做一些机器翻译工作,我认为在将整个输入(要翻译的语言)输入到输入时间步长然后生成输出(翻译的语言)之后,序列的破坏就会发生)将通过manual loop在每个时间步进行,因为现在我们得到了输入,并开始使用迭代在每个时间步产生输出.我说对了吗?

F .由于LSTM的默认工作要求在答案中提到三点,因此,在中断序列的情况下,current_inputprevious_output是否被馈以相同的向量,因为在没有当前输入可用的情况下它们的值是相同的?

G .在预测:部分下的多对有状态=真的情况下,代码为:

predicted = model.predict(totalSequences)
firstNewStep = predicted[:,-1:]

由于finding the very next word in the current sequence的手动循环至今尚未使用,我怎么知道model.predict(totalSequences)所预测的时间步长count,以便从预测的最后一步开始(predicted[:,-1:])随后将用于生成其余序列吗?我的意思是,我怎么知道predicted = model.predict(totalSequences)之前在predicted = model.predict(totalSequences)中产生的序列数(后来使用).

.在D答案中,我仍然不知道如何训练我的模型?我知道在训练过程中使用手动循环可能会很痛苦,但是如果我不使用它,那么在we want the 10 future steps, we cannot output them at once because we don't have the necessary 10 input steps的情况下如何训练模型?只需使用model.fit()就能解决我的问题吗?

II . D答案的最后一个段落,You could train step by step using train_on_batch only in the case you have the expected outputs of each step. But otherwise I think it's very complicated or impossible to train..

您能详细解释一下吗?

step by step是什么意思?如果我没有拥有后续序列的输出,这将如何影响我的训练?训练期间我是否仍需要手动循环.如果没有,那么model.fit()功能是否可以按预期工作?

III .我将"repeat" option解释为使用repeat vector.使用重复向量不是仅对one to many情况有用,而又不适用于many to many情况,因为后者将有很多输入向量可供选择(用作单个重复向量)吗?在many to many情况下,如何使用repeat vector?

解决方案

问题3

理解问题3 是理解其他问题的关键,因此,让我们首先尝试一下.

Keras中的所有循环图层均执行隐藏循环.这些循环对我们来说是完全不可见的,但是我们可以在最后看到每次迭代的结果.

不可见的迭代次数等于time_steps维度.因此,有关步骤的LSTM的递归计算会发生.

如果我们传递带有X个步骤的输入,则将有X个不可见的迭代.

LSTM中的每次迭代都需要3个输入:

  • 此步骤的输入数据的各自片段
  • 图层的内部状态
  • 最后一次迭代的输出

因此,以下面的示例图片为例,我们的输入有5个步骤:

在一次预测中,Keras会做什么?

  • 第0步:
    • 进行输入的第一步,input_data[:,0,:]形状为(batch, 2)的切片
    • 获取内部状态(此时为零)
    • 执行最后一个输出步骤(第一步不存在)
    • 将计算结果传递给:
      • 更新内部状态
      • 创建一个输出步骤(输出0)
  • 第1步:
    • 进行下一步输入:input_data[:,1,:]
    • 采用更新后的内部状态
    • 采用上一步生成的输出(输出0)
    • 通过相同的计算得出:
      • 再次更新内部状态
      • 再创建一个输出步骤(输出1)
  • 第2步:
    • input_data[:,2,:]
    • 采用更新后的内部状态
    • 获取输出1
    • 通过:
      • 更新内部状态
      • 创建输出2
  • 依次类推,直到第4步.

  • 最后:

    • 如果stateful=False:自动重置内部状态,则重置最后的输出步骤
    • 如果stateful=True:保持内部状态,则保持最后一个输出步骤

您将看不到这些步骤中的任何一个.看起来就像是一次通行证.

但是您可以选择:

  • return_sequences = True:返回每个输出步骤,形状为(batch, steps, units)
    • 这确实是很多对很多.您在输出中获得的步数与输入
    • 中的步数相同
  • return_sequences = False:仅返回最后一个输出步骤,形状为(batch, units)
    • 这是一对多的.您将为整个输入序列生成一个结果.

现在,这回答了问题2的第二部分:是的,predict将在不引起注意的情况下计算所有内容.但是:

输出步数将等于输入步数

问题4

现在,在转到问题2之前,让我们看一下4,它实际上是答案的基础.

是的,应该手动完成批次划分. Keras不会更改您的批次.那么,为什么要分割一个序列?

  • 1,序列太大,一批不适合计算机或GPU的内存
  • 2,您想做问题2 上发生的事情:在每个步骤迭代之间操作批处理.

问题2

在问题2中,我们正在预测未来".那么,输出步数是多少?好吧,这是您要预测的数字.假设您正在尝试根据过去的经验来预测客户数量.您可以决定预测未来一个月或10个月.您的选择.

现在,您以为predict会立即计算整个事情是正确的,但请记住我在上面说的问题3 :

输出步数等于输入步数

还请记住,第一个输出步骤是第一个输入步骤的结果,第二个输出步骤是第二个输入步骤的结果,依此类推.

但是我们想要未来,而不是一个与前面的步骤一一对应的东西.我们希望结果步骤在最后"步骤之后.

因此,我们面临一个局限性:如果没有相应的输入,如何定义固定数量的输出步骤? (遥远的未来的输入也是未来,因此,它们不存在)

这就是为什么我们将序列分为仅一步的序列.因此,predict也将只输出一步.

执行此操作时,我们可以处理每次迭代之间的批处理.而且我们有能力将输出数据(以前没有)作为输入数据.

有状态是必要的,因为我们希望将这些步骤中的每个步骤作为一个序列连接(不要丢弃状态).

问题5

我知道stateful=True的最佳实践应用是问题2 的答案.我们想要在步骤之间操纵数据.

这可能是一个虚拟的示例,但是另一个应用程序是,例如,您要从互联网上的用户那里接收数据.用户每天使用您的网站时,都会向模型提供另一步数据(并且您希望以相同顺序继续该用户的先前历史记录).

问题1

然后,最后是问题1.

我会说:除非您需要,否则请始终避免使用stateful=True.
您不需要它来构建一对多网络,因此最好不要使用它.

请注意,此stateful=True示例与预测未来示例相同,但是您从一个步骤开始.这很难实现,由于手动循环,它的速度会变差.但是您可以控制输出步骤的数量,在某些情况下可能需要这样做.

计算也将有所不同.在这种情况下,我真的无法回答一个是否比另一个更好.但是我不相信会有很大的不同.但是网络是某种艺术",测试可能会带来有趣的惊喜.

编辑答案:

A

我们不应将状态"与权重"相混淆.它们是两个不同的变量.

  • 权重:可学习的参数,从不重置. (如果重新设置权重,则会丢失模型学习到的所有内容)
  • 状态:一批序列的当前内存(与我现在在序列上的哪个步骤以及从该批次中的特定序列"到此步骤所学到的内容有关).

想象一下您正在看电影(一个序列).每一秒钟都使您建立起记忆,例如角色的名称,角色,关系以及它们之间的关系.

现在想象一下,您会获得从未见过的电影,然后开始观看电影的最后一秒.您将无法理解电影的结尾,因为您需要该电影的上一个故事. (各州)

现在,您已完成观看整部电影的图像.现在,您将开始观看新电影(新序列).您无需记住上一部电影中发生的事情.如果您尝试加入电影",您会感到困惑.

在此示例中:

  • 权重:您理解电影并理解电影的能力,记忆重要名称和动作的能力
  • 状态:在已暂停的电影中,状态是从开始到现在所发生事件的记忆.

因此,状态是未学习的".状态是计算"的,是针对批次中的每个单独序列逐步建立的.这就是为什么:

  • 重置状态意味着从第0步开始新的序列(开始新电影)
  • 保持状态意味着从最后一步开始继续执行相同的顺序(继续播放已暂停的电影,或观看该故事的第2部分)

状态正是使循环网络正常工作的原因,就像它们具有过去步骤中的内存"一样.

B

在LSTM中,最后的输出步骤是状态"的一部分.

LSTM状态包含:

  • 通过计算更新每一步的内存矩阵
  • 最后一步的输出

因此,是的:每个步骤都会产生自己的输出,但是每个步骤都将最后一步的输出用作状态.这就是LSTM的构建方式.

  • 如果您要继续"相同的序列,则需要存储最后一步的结果
  • 如果您要开始"新序列,则不希望记忆最后一步的结果(如果不重置状态,这些结果将被保存)

C

您随时可以停下来.您想预测未来的几步?那是您的出发点.

想象一下,我有一个包含20个步骤的序列.我想预测未来的10个步骤.

在标准(无状态)网络中,我们可以使用:

  • 一次输入19步(从0到18)
  • 一次输出19步(从1到19)

这是预测下一步"(请注意shift = 1步).我们可以这样做,因为我们拥有所有可用的输入数据.

但是,当我们想要10个将来的步骤时,由于我们没有必要的10个输入步骤(这些输入步骤是将来的,我们需要模型首先对其进行预测),因此无法一次输出它们.

因此,我们需要根据现有数据预测一个未来的步骤,然后将该步骤用作下一步的输入.

但是我希望这些步骤都已连接.如果使用stateful=False,该模型将看到很多长度为1的序列".不,我们想要一个长度为30的序列.

D

这是一个很好的问题,你让我....

有状态的一对多是我在编写该答案时所想到的一个想法,但我从未使用过.我更喜欢重复"选项.

仅当您具有每步的预期输出时,才可以使用train_on_batch逐步进行训练.但是否则,我认为培训非常复杂或不可能.

E

这是一种常见的方法.

  • 通过网络生成浓缩向量(此向量可以是结果,也可以是生成的状态,或两者皆有)
  • 将此浓缩的矢量用作另一个网络的初始输入/状态,手动逐步生成,并在模型生成句子结尾"的单词或字符时停止.

也有没有手动循环的固定尺寸型号.您假设句子的最大长度为X个字.短于此的结果句子用句子结尾"或空"字/字符完成. Masking层在这些模型中非常有用.

F

您仅提供输入.其他两件事(最后的输出和内部状态)已经存储在有状态层中.

仅由于我们的特定模型正在预测下一步,所以我才将输入=最后输出.那就是我们想要的.对于每个输入,下一步.

我们在训练中以移位的顺序进行了授课.

G

没关系.我们只需要最后一步.

  • 第一个:保留序列数.
  • -1:仅考虑最后一步.

但是,如果您想知道,可以打印predicted.shape.在此模型中等于totalSequences.shape.

编辑2

I

首先,我们无法使用一对多"模型来预测未来,因为我们没有相关数据.如果您没有序列步骤的数据,则不可能理解序列".

因此,这种类型的模型应用于其他类型的应用程序.就像我之前说的那样,对于这个问题,我确实没有很好的答案.最好先有一个目标",然后我们决定哪种模型更适合该目标.

II

逐步"是指手动循环.

如果您没有后续步骤的输出,那么我认为这是不可能的.这可能根本不是一个有用的模型. (但我不是一无所知的人)

如果有输出,是的,您可以使用fit训练整个序列,而不必担心手动循环.

III

您对III是正确的.您不会在很多情况下使用重复向量,因为您有变化的输入数据.

一对多"和多对多"是两种不同的技术,每种都有其优点和缺点.一种将对某些应用程序有好处,另一种将对其他应用程序有好处.

I am new to LSTMs and going through the Understanding Keras LSTMs and had some silly doubts related to a beautiful answer by Daniel Moller.

Here are some of my doubts:

  1. There are 2 ways specified under the Achieving one to many section where it’s written that we can use stateful=True to recurrently take the output of one step and serve it as the input of the next step (needs output_features == input_features).

    In the One to many with repeat vector diagram, the repeated vector is fed as input in all the time-step, whereas in the One to many with stateful=True the output is fed as input in the next time step. So, aren't we changing the way the layers work by using the stateful=True?

    Which of the above 2 approaches (using the repeat vector OR feeding the previous time-step output as the next input) should be followed when building an RNN?

  2. Under the One to many with stateful=True section, to change the behaviour of one to many, in the code for manual loop for prediction, how will we know the steps_to_predict variable because we don't know the ouput sequence length in advance.

    I also did not understand the way the entire model is using the last_step output to generate the next_step ouput. It has confused me about the working of model.predict() function. I mean, doesn't model.predict() simultaneously predict the entire output sequences at once rather than looping through the no. of output sequences (whose value I still don't know) to be generated and doing model.predict() to predict a specific time-step output in a given iteration?

  3. I couldn't understand the entire of Many to many case. Any other link would be helpful.

  4. I understand that we use model.reset_states() to make sure that a new batch is independent of the previous batch. But, Do we manually create batches of sequence such that one batch follows another batch or does Keras in stateful=True mode automatically divides the sequence into such batches.

    If it's done manually then, why would anyone divide the dataset into such batches in which a part of a sequence is in one batch and the other in the next batch?

  5. At last, what are the practical implementation or examples/use-cases where stateful=True would be used(because this seems to be something unusual)? I am learning LSTMs and this is the first time I've been introduced to stateful in Keras.

Can anyone help me in explaining my silly questions so that I can be clear on LSTM implementation in Keras?

EDIT: Asking some of these for clarification of the current answer and some for the remaining doubts

A. So, basically stateful lets us keep OR reset the inner state after every batch. Then, how would the model learn if we keep on resetting the inner state again and again after each batch trained? Does resetting truely means resetting the parameters(used in computing the hidden state)?

B. In the line If stateful=False: automatically resets inner state, resets last output step. What did you mean by resetting the last output step? I mean, if every time-step produces its own output then what does resetting of last output step mean and that too only the last one?

C. In response to Question 2 and 2nd point of Question 4, I still didn't get your manipulate the batches between each iteration and the need of stateful((last line of Question 2) which only resets the states). I got to the point that we don't know the input for every output generated in a time-step.

So, you break the sequences into sequences of only one-step and then use new_step = model.predict(last_step) but then how do you know about how long do you need to do this again and again(there must be a stopping point for the loop)? Also, do explain the stateful part( in the last line of Question 2).

D. In the code under One to many with stateful=True, it seems that the for loop(manual loop) is used for predicting the next word is used just in test time. Does the model incorporates that thing itself at train time or do we manually need use this loop also at the train time?

E. Suppose we are doing some machine translation job, I think the breaking of sequences will occur after the entire input(language to translate) has been fed to the input time-steps and then generation of outputs(translated language) at each time-step is going to take place via the manual loop because now we are ended up with the inputs and starting to produce output at each time-step using the iteration. Did I get it right?

F. As the default working of LSTMs requires 3 things mentioned in the answer, so in case of breaking of sequences, are current_input and previous_output fed with same vectors because their value in case of no current input being available is same?

G. Under the many to many with stateful=True under the Predicting: section, the code reads:

predicted = model.predict(totalSequences)
firstNewStep = predicted[:,-1:]

Since, the manual loop of finding the very next word in the current sequence hasn't been used up till now, how do I know the count of the time-steps that has been predicted by the model.predict(totalSequences) so that the last step from predicted(predicted[:,-1:]) will then later be used for generating the rest of the sequences? I mean, how do I know the number of sequences that have been produced in the predicted = model.predict(totalSequences) before the manual for loop (later used).

EDIT 2:

I. In D answer I still didn't get how will I train my model? I understand that using the manual loop(during training) can be quite painful but then if I don't use it how will the model get trained in the circumstances where we want the 10 future steps, we cannot output them at once because we don't have the necessary 10 input steps? Will simply using model.fit() solve my problem?

II. D answer's last para, You could train step by step using train_on_batch only in the case you have the expected outputs of each step. But otherwise I think it's very complicated or impossible to train..

Can you explain this in more detail?

What does step by step mean? If I don't have OR have the output for the later sequences , how will that affect my training? Do I still need the manual loop during training. If not, then will the model.fit() function work as desired?

III. I interpreted the "repeat" option as using the repeat vector. Wouldn't using the repeat vector be just good for the one to many case and not suitable for the many to many case because the latter will have many input vectors to choose from(to be used as a single repeated vector) ? How will you use the repeat vector for the many to many case?

解决方案

Question 3

Understanding the question 3 is sort of a key to understand the others, so, let's try it first.

All recurrent layers in Keras perform hidden loops. These loops are totally invisible to us, but we can see the results of each iteration at the end.

The number of invisible iterations is equal to the time_steps dimension. So, the recurrent calculations of an LSTM happen regarding the steps.

If we pass an input with X steps, there will be X invisible iterations.

Each iteration in an LSTM will take 3 inputs:

  • The respective slice of the input data for this step
  • The inner state of the layer
  • The output of the last iteration

So, take the following example image, where our input has 5 steps:

What will Keras do in a single prediction?

  • Step 0:
    • Take the first step of the inputs, input_data[:,0,:] a slice shaped as (batch, 2)
    • Take the inner state (which is zero at this point)
    • Take the last output step (which doesn't exist for the first step)
    • Pass through the calculations to:
      • Update the inner state
      • Create one output step (output 0)
  • Step 1:
    • Take the next step of the inputs: input_data[:,1,:]
    • Take the updated inner state
    • Take the output generated in the last step (output 0)
    • Pass through the same calculation to:
      • Update the inner state again
      • Create one more output step (output 1)
  • Step 2:
    • Take input_data[:,2,:]
    • Take the updated inner state
    • Take output 1
    • Pass through:
      • Update the inner state
      • Create output 2
  • And so on until step 4.

  • Finally:

    • If stateful=False: automatically resets inner state, resets last output step
    • If stateful=True: keep inner state, keep last ouptut step

You will not see any of these steps. It will look like just a single pass.

But you can choose between:

  • return_sequences = True: every output step is returned, shape (batch, steps, units)
    • This is exactly many to many. You get the same number of steps in the output as you had in the input
  • return_sequences = False: only the last output step is returned, shape (batch, units)
    • This is many to one. You generate a single result for the entire input sequence.

Now, this answers the second part of your question 2: Yes, predict will compute everything without you noticing. But:

The number of output steps will be equal to the number of input steps

Question 4

Now, before going to the question 2, let's look at 4, which is actually the base of the answer.

Yes, the batch division should be done manually. Keras will not change your batches. So, why would I want to divide a sequence?

  • 1, the sequence is too big, one batch doesn't fit the computer's or the GPU's memory
  • 2, you want to do what is happening on question 2: manipulate the batches between each step iteration.

Question 2

In question 2, we are "predicting the future". So, what is the number of output steps? Well, it's the number you want to predict. Suppose you're trying to predict the number of clients you will have based on the past. You can decide to predict for one month in the future, or for 10 months. Your choice.

Now, you're right to think that predict will calculate the entire thing at once, but remember question 3 above where I said:

The number of output steps is equal to the number of input steps

Also remember that the first output step is result of the first input step, the second output step is result of the second input step, and so on.

But we want the future, not something that matches the previous steps one by one. We want that the result step follows the "last" step.

So, we face a limitation: how to define a fixed number of output steps if we don't have their respective inputs? (The inputs for the distant future are also future, so, they don't exist)

That's why we break our sequence into sequences of only one step. So predict will also output only one step.

When we do this, we have the ability to manipulate the batches between each iteration. And we have the ability to take output data (which we didn't have before) as input data.

And stateful is necessary because we want that each of these steps be connected as a single sequence (don't discard the states).

Question 5

The best practical application of stateful=True that I know is the answer of question 2. We want to manipulate the data between steps.

This might be a dummy example, but another application is if you're for instance receiving data from a user on the internet. Each day the user uses your website, you give one more step of data to your model (and you want to continue this user's previous history in the same sequence).

Question 1

Then, finally question 1.

I'd say: always avoid stateful=True, unless you need it.
You don't need it to build a one to many network, so, better not use it.

Notice that the stateful=True example for this is the same as the predict the future example, but you start from a single step. It's hard to implement, it will have worse speed because of manual loops. But you can control the number of output steps and this might be something you want in some cases.

There will be a difference in calculations too. And in this case I really can't answer if one is better than the other. But I don't believe there will be a big difference. But networks are some kind of "art", and testing might bring funny surprises.

Answers for EDIT:

A

We should not mistake "states" with "weights". They're two different variables.

  • Weights: the learnable parameters, they're never reset. (If you reset the weights, you lose everything the model learned)
  • States: current memory of a batch of sequences (relates to which step on the sequence I am now and what I have learned "from the specific sequences in this batch" up to this step).

Imagine you are watching a movie (a sequence). Every second makes you build memories like the name of the characters, what they did, what their relationship is.

Now imagine you get a movie you never saw before and start watching the last second of the movie. You will not understand the end of the movie because you need the previous story of this movie. (The states)

Now image you finished watching an entire movie. Now you will start watching a new movie (a new sequence). You don't need to remember what happened in the last movie you saw. If you try to "join the movies", you will get confused.

In this example:

  • Weights: your ability to understand and intepret movies, ability to memorize important names and actions
  • States: on a paused movie, states are the memory of what happened from the beginning up to now.

So, states are "not learned". States are "calculated", built step by step regarding each individual sequence in the batch. That's why:

  • resetting states mean starting new sequences from step 0 (starting a new movie)
  • keeping states mean continuing the same sequences from the last step (continuing a movie that was paused, or watching part 2 of that story )

States are exactly what make recurrent networks work as if they had "memory from the past steps".

B

In an LSTM, the last output step is part of the "states".

An LSTM state contains:

  • a memory matrix updated every step by calculations
  • the output of the last step

So, yes: every step produces its own output, but every step uses the output of the last step as state. This is how an LSTM is built.

  • If you want to "continue" the same sequence, you want memory of the last step results
  • If you want to "start" a new sequence, you don't want memory of the last step results (these results will keep stored if you don't reset states)

C

You stop when you want. How many steps in the future do you want to predict? That's your stopping point.

Imagine I have a sequence with 20 steps. And I want to predict 10 steps in the future.

In a standard (non stateful) network, we can use:

  • input 19 steps at once (from 0 to 18)
  • output 19 steps at once (from 1 to 19)

This is "predicting the next step" (notice the shift = 1 step). We can do this because we have all the input data available.

But when we want the 10 future steps, we cannot output them at once because we don't have the necessary 10 input steps (these input steps are future, we need the model to predict them first).

So we need to predict one future step from existing data, then use this step as input for the next future step.

But I want that these steps are all connected. If I use stateful=False, the model will see a lot of "sequences of length 1". No, we want one sequence of length 30.

D

This is a very good question and you got me ....

The stateful one to many was an idea I had when writing that answer, but I never used this. I prefer the "repeat" option.

You could train step by step using train_on_batch only in the case you have the expected outputs of each step. But otherwise I think it's very complicated or impossible to train.

E

That's one common approach.

  • Generate a condensed vector with a network (this vector can be a result, or the states generated, or both things)
  • Use this condensed vector as initial input/state of another network, generate step by step manually and stop when a "end of sentence" word or character is produced by the model.

There are also fixed size models without the manual loop. You suppose your sentence has a maximum length of X words. The result sentences that are shorter than this are completed with "end of sentence" or "null" words/characters. A Masking layer is very useful in these models.

F

You provide only the input. The other two things (last output and inner states) are already stored in the stateful layer.

I made the input = last output only because our specific model is predicting the next step. That's what we want it to do. For each input, the next step.

We taught this with the shifted sequence in training.

G

It doesn't matter. We want only the last step.

  • The number of sequences is kept by the first :.
  • And only the last step is considered by -1:.

But if you want to know, you can print predicted.shape. It is equal to totalSequences.shape in this model.

Edit 2

I

First, we can't use "one to many" models to predict the future, because we don't have data for that. There is no possibility to understand a "sequence" if you don't have the data for the steps of the sequence.

So, this type of model should be used for other types of applications. As I said before, I don't really have a good answer for this question. It's better to have a "goal" first, then we decide which kind of model is better for that goal.

II

With "step by step" I mean the manual loop.

If you don't have the outputs of later steps, I think it's impossible to train. It's probably not a useful model at all. (But I'm not the one that knows everything)

If you have the outputs, yes, you can train the entire sequences with fit without worrying about manual loops.

III

And you're right about III. You won't use repeat vector in many to many because you have varying input data.

"One to many" and "many to many" are two different techniques, each one with their advantages and disadvantages. One will be good for certain applications, the other will be good for other applications.

这篇关于对“了解Keras LSTM"的怀疑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆