对如何实现时间分布式LSTM + LSTM感到困惑 [英] Confused about how to implement time-distributed LSTM + LSTM

查看:126
本文介绍了对如何实现时间分布式LSTM + LSTM感到困惑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

经过大量的阅读和制图,我认为我提出了一个模型,可以将其用作对需要调整哪些参数和功能进行更多测试的基础.但是,我对如何实现以下测试用例感到困惑(所有数字都比最终模型小几个数量级,但我想从小做起):

After much reading and diagramming, I think I've come up with a model that I can use to as the foundation for more testing on which parameters and features I need to tweak. However, I am confused about how to implement the following test case (all numbers are orders of magnitude smaller than final model, but I want to start small):

  • 输入数据:5000x1时间序列向量,分为5个1000x1历元
  • 对于每个时间步,将通过双向LSTM层的3个时间分布的副本放入3个时期的数据,每个副本将输出10x1的矢量(提取10个特征),然后将其作为第二个双向LSTM层的输入.
  • 对于每个时间步,都会忽略第一个和最后一个标签,但是中间的一个是所希望的.

这是我想出的,可以编译的.但是,看一下model.summary,我想我没有想让每个输出时间步长的第一个LSTM在3个输入序列上运行的事实.我在做什么错了?

Here's what I've come up with, which does compile. However, looking at the model.summary, I think I'm missing the fact that I want the first LSTM to be run on a 3 of the input sequences for each output time step. What am I doing wrong?

model = Sequential()
model.add(TimeDistributed(Bidirectional(LSTM(11, return_sequences=True, recurrent_dropout=0.1, unit_forget_bias=True), input_shape=(3, 3, epoch_len), merge_mode='sum'), input_shape=(n_epochs, 3, epoch_len)))
model.add(TimeDistributed(Dense(7)))
model.add(TimeDistributed(Flatten()))
model.add(Bidirectional(LSTM(12, return_sequences=True, recurrent_dropout=0.1, unit_forget_bias=True), merge_mode='sum'))
model.add(TimeDistributed(Dense(n_classes, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

推荐答案

由于您的问题有些困惑,我将采用以下假设.

Since your question is a bit confused, I'll take the following assumptions.

  • 您有一个包含5000个时间步长的时间序列,每一步都有一个功能.形状(1, 5000, 1)
  • 问题答案的主要部分:您要运行滑动窗口"案例,即窗口的大小等于3000,步幅为1000.
  • 您希望将窗口大小划分为3个内部时间序列,这3个序列中的每个序列具有1000个步长,每个步长仅具有一个功能.每个系列都输入与独立系列相同的LSTM(相当于具有3个LSTM副本)-Shape (slidingWindowSteps, 3, 1000, 1)
  • 重要:在这3个系列中,您需要3个没有长度且具有10个功能的输出.形状(1,3,10). (您的图片显示为1x10,但您的文字显示为10x1,我假设图片是正确的).
  • 您希望将这3个输出以3个步骤的单个顺序合并,形状为(1,3,10)
  • 您希望处理此3步序列的LSTM也返回3步序列
  • You have one time series of 5000 time steps, each step with one feature. Shape (1, 5000, 1)
  • The main part of the answer to your question: You want to run a "sliding window" case, being the size of the window equal to 3000, and the stride of the window being 1000.
  • You want the window size to be divided in 3 internal time series, each of these 3 series with 1000 steps, each step with only one feature. Each of these series enters the same LSTM as independent series (which is equivalent to having 3 copies of the LSTM) - Shape (slidingWindowSteps, 3, 1000, 1)
  • Important: From these 3 series, you want 3 outputs without length and with 10 features. Shape (1,3,10). (Your image says 1x10, but your text says 10x1, I'm assuming the image is correct).
  • You want these 3 outputs to be merged in a single sequence of 3 steps, shape (1,3,10)
  • You want the LSTM that processes this 3 step sequence to also return a 3 step sequence

为滑动窗套做准备:

在滑动窗口的情况下,不可避免地要重复数据.您需要先进行输入.

In a sliding window case, it's unavoidable to duplicate data. You need to first work in your input.

采用初始时间序列(1,5000,1),我们需要将其按批次分为三批,每批包含1000个样本.在这里,我仅对X执行此操作,您将必须执行与Y类似的操作

Taking the initial time series (1,5000,1), we need to split it prolerly in a batch containing samples with 3 groups of 1000. Here I do this for X only, you will have to do a similar thing to Y

numberOfOriginalSequences = 1
totalSteps = 5000
features = 1

#example of original input with 5000 steps
originalSeries = np.array(
                        range(numberOfOriginalSequences*totalSteps*features)
                 ).reshape((numberOfOriginalSequences,
                            totalSteps,
                            features))  

windowSize = 3000
windowStride = 1000

totalWindowSteps = ((totalSteps - windowSize)//windowStride) + 1

#at first, let's keep these dimensions for better understanding 
processedSequences = np.empty((numberOfOriginalSequences,
                               totalWindowSteps,
                               windowSize,
                               features))

for seq in range(numberOfOriginalSequences):
    for winStep in range(totalWindowSteps):
        start = winStep * windowStride
        end = start + windowSize
        processedSequences[seq,winStep,:,:] = originalSeries[seq,start:end,:]    

#now we reshape the array to transform each window step in independent sequences:
totalSamples = numberOfOriginalSequences*totalWindowSteps
groupsInWindow = windowSize // windowStride
processedSequences = processedSequences.reshape((totalSamples,
                                                 groupsInWindow,
                                                 windowStride,
                                                 features))

print(originalSeries)
print(processedSequences)

创建模型:

关于第一个添加的图层的一些评论:

A few comments about your first added layer:

  • 该模型仅考虑一个input_shape.并且此形状为(groupsInWindow,windowStride,features).它应该位于最外部的包装器中:TimeDistributed.
  • 您不想保留1000个时间步长,只想要10个结果功能:return_sequences = False. (如果需要更多层,则可以在第一阶段使用许多LSTM.在这种情况下,第一个可以保留步骤,只有最后一个需要使用return_sequences=False)
  • 您需要10个功能,所以units=10
  • The model only takes into account one input_shape. And this shape is (groupsInWindow,windowStride,features). It should be in the most external wrapper: the TimeDistributed.
  • You don't want to keep 1000 time steps, you want only 10 resulting features: return_sequences = False. (You can use many LSTMs in this first stage, if you want more layers. In this case the first ones can keep the steps, only the last one needs to use return_sequences=False)
  • You want 10 features, so units=10

我将使用功能性API只是为了查看摘要中的输入形状,这有助于理解事物.

I'll use the functional API just to see the input shape in the summary, which hels understanding things.

from keras.models import Model

intermediateFeatures = 10

inputTensor = Input((groupsInWindow,windowStride,features))

out = TimeDistributed(
    Bidirectional(
        LSTM(intermediateFeatures, 
             return_sequences=False, 
             recurrent_dropout=0.1, 
             unit_forget_bias=True), 
        merge_mode='sum'))(inputTensor)

这时,您已经消除了1000个时间步.由于我们使用了return_sequences=False,因此无需进行展平或类似操作.数据已经以(samples, groupsInWindow,intermediateFeatures)的形式成形. Dense层也不是必需的.但是,只要最终形状相同,就可以按照自己的方式做,这不会是错误的".

At this point, you have eliminated the 1000 time steps. Since we used return_sequences=False, there will be no need to flatten or things like that. The data is already shaped in the form (samples, groupsInWindow,intermediateFeatures). The Dense layer is also not necessary. But it wouldn't be "wrong" if you wanted to do it the way you did, as long as the final shape is the same.

arbitraryLSTMUnits = 12
n_classes = 17

out = Bidirectional(
    LSTM(arbitraryLSTMUnits, 
         return_sequences=True, 
         recurrent_dropout=0.1, 
         unit_forget_bias=True), 
    merge_mode='sum')(out)

out = TimeDistributed(Dense(n_classes, activation='softmax'))(out)

如果要丢弃边框,则可以添加此层:

And if you're going to discard the borders, you can add this layer:

out = Lambda(lambda x: x[:,1,:])(out) #model.add(Lambda(lambda x: x[:,1,:]))

完成模型:

model = Model(inputTensor,out)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()


这是尺寸在此模型中的流动方式. 我在这里放置的第一个尺寸(totalSamples)在model.summary()中显示为None.


Here is how dimensions are flowing through this model. The first dimension I put here (totalSamples) is shown as None in the model.summary().

  • 输入:(totalSamples,groupsInWindow,windowStride,features)
  • 时间分布式LSTM的工作方式如下:
    • TimeDistributed允许第四个维度,即groupsInWindow. 此尺寸将保持不变.
    • 带有return_sequences=False的LSTM将消除windowStride并更改功能(windowStride,倒数第二个尺寸,在此LSTM的时间步位置):
    • 结果:(totalSamples, groupsInWindow, intermadiateFeatures)
    • Input: (totalSamples,groupsInWindow,windowStride,features)
    • The Time Distributed LSTM works like this:
      • TimeDistributed allows a 4th dimension, which is groupsInWindow. This dimension will be kept.
      • The LSTM with return_sequences=False will eliminate the windowStride and change the features (windowStride, the second last dimension, is at the time steps position for this LSTM):
      • result: (totalSamples, groupsInWindow, intermadiateFeatures)

      这篇关于对如何实现时间分布式LSTM + LSTM感到困惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆