LSTM/GRU 自编码器收敛 [英] LSTM/GRU autoencoder convergency

查看:33
本文介绍了LSTM/GRU 自编码器收敛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试在多变量时间序列数据集上运行 LSTM 自动编码器:
X_train (200, 23, 178) - X_val (100, 23, 178) - X_test (100, 23, 178)

Trying to run an LSTM autoencoder over a dataset of multi variate time series:
X_train (200, 23, 178) - X_val (100, 23, 178) - X_test (100, 23, 178)

简单的自动编码器比 LSTM AE 的简单架构获得更好的结果.

A plain autoencoder gets better results rather than a simple architecture of a LSTM AE.

我对如何使用 Repeat Vector 包装层有一些疑问,据我所知,它应该简单地重复等于序列长度的次数LSTM/GRU 单元,以馈送解码器层的输入形状.

I have some doubts about how I use the Repeat Vector wrapper layer which, as far as I understood, is supposed to simply repeat a number of times equal to the sequence length the last state of the LSTM/GRU cell, in order to feed the input shape of the decoder layer.

模型架构没有出现任何错误,但结果仍然比简单的 AE 差一个数量级,而我期望它们至少相同,因为我使用的架构应该更适合时间问题.

The model architecture does not rise any error, but still the results are an order of magnitude worst than a simple AE, while I was expecting them to be at least the same, as I am using an architecture which should better fit the temporal problem.

首先,这些结果是否具有可比性?

Are these results comparable, first of all?

尽管如此,LSTM-AE 的重建误差看起来一点也不好看.

Nevertheless, the reconstruction error of the LSTM-AE does not look good at all.

Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 178)               31862     
_________________________________________________________________
batch_normalization (BatchNo (None, 178)               712       
_________________________________________________________________
dense_1 (Dense)              (None, 59)                10561     
_________________________________________________________________
dense_2 (Dense)              (None, 178)               10680     
=================================================================

  • 优化器:sgd
  • 损失:mse
  • 密集层的激活函数:relu
  • _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    input_1 (InputLayer)         (None, 23, 178)           0         
    _________________________________________________________________
    gru (GRU)                    (None, 59)                42126     
    _________________________________________________________________
    repeat_vector (RepeatVector) (None, 23, 59)            0         
    _________________________________________________________________
    gru_1 (GRU)                  (None, 23, 178)           127092    
    _________________________________________________________________
    time_distributed (TimeDistri (None, 23, 178)           31862     
    =================================================================
    

    • 优化器:sgd
    • 损失:mse
    • gru 层的激活函数:relu
    • 推荐答案

      你上面的 2 个模型似乎没有可比性,在一个有意义的方式.第一个模型试图压缩包含 178 个值的向量.这些向量很可能包含一些冗余信息,因此可以合理地假设您将能够压缩它们.

      The 2 models you have above do not seem to be comparable, in a meaningful way. The first model is attempting to compress your vector of 178 values. It is quite possible that these vectors contain some redundant information so it is reasonable to assume that you will be able to compress them.

      第二个模型试图通过单个 GRU 层压缩 23 x 178 向量序列.这是一个参数数量明显更多的任务.重复向量简单地获取第 1 个 GRU 层(编码器)的输出,并将其作为第 2 个 GRU 层(解码器)的输入.但是然后您取解码器的单个值.我建议您在第二个 GRU(解码器)中使用 return_sequences=True 而不是 TimeDistributed 层.否则,您是说您期望 23x178 序列由所有具有相同值的元素组成;这必须导致非常高的错误/无解决方案.

      The second model is attempting to compress a sequence of 23 x 178 vectors via single GRU layer. This is a task with a significantly higher number of parameters. The repeat vector simply takes the output of the 1st GRU layer (the encoder) and makes it in input of the 2nd GRU layer (the decoder). But then you take a single value of the decoder. Instead of the TimeDistributed layer, I'd recommend that you use return_sequences=True in the 2nd GRU (decoder). Otherwise you are saying that you are expecting that the 23x178 sequence is constituted with elements all with the same value; that has to lead to a very high error / no solution.

      我建议你退后一步.您的目标是找到序列之间的相似性吗?或者能够做出预测?自动编码器方法更适用于相似性任务.为了进行预测,我建议您更多地采用将 Dense(1) 层应用于序列步骤的输出的方法.

      I'd recommend you take a step back. Is your goal to find similarity between the sequences ? Or to be able to make predictions ? An auto-encoder approach is preferable for a similarity task. In order to make predictions, I'd recommend that you go more towards an approach where you apply a Dense(1) layer to the output of the sequences step.

      您的数据集是否开放?可用的 ?如果可能的话,我很想试一试.

      Is your data-set open ? available ? I'd be curious on taking it for a spin if that would be possible.

      这篇关于LSTM/GRU 自编码器收敛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆