LSTM/GRU自动编码器收敛 [英] LSTM/GRU autoencoder convergency

查看:400
本文介绍了LSTM/GRU自动编码器收敛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到一个奇怪的情况,试图在我的时间序列数据集上创建高效的自动编码器:
X_train (200, 23, 178) X_val (100, 23, 178) X_test (100, 23, 178)

I have a strange situation trying to create an efficient autoencoder over my time series dataset:
X_train (200, 23, 178) X_val (100, 23, 178) X_test (100, 23, 178)

在时间序列数据集上,使用简单的自动编码器比使用简单的LSTM AE可获得更好的结果.
我对重复向量包装器层的利用有些担心,据我所知,该层应该重复多次,例如序列长度是LSTM/GRU单元的最后状态,为了适合解码器层的输入形状.

With a simple autoencoder I have better results rather than my simple LSTM AE over a dataset of time series.
I have some concerns about my utilization of the Repeat Vector wrapper layer, which as far as I understood, is supposed to repeat a number of times like the sequence length the last state of the LSTM/GRU cell, in order to fit the input shape of the decoder layer.

该模型不会出现任何错误,但结果仍然比简单的AE更差一个数量级,而我希望至少与我使用的体系结构相同,该体系结构应适合域问题.不过,重建看起来一点也不好,只是杂音.

The model does not arise any error, but still results are an order of magnitude worst than a simple AE, while I am expecting to be at least the same as I am using an architecture which should properly fit the domain problem. Nevertheless, the reconstruction does not look good at all, just noise.

Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 178)               31862     
_________________________________________________________________
batch_normalization (BatchNo (None, 178)               712       
_________________________________________________________________
dense_1 (Dense)              (None, 59)                10561     
_________________________________________________________________
dense_2 (Dense)              (None, 178)               10680     
=================================================================

  • 优化程序:sgd
  • 损失:毫秒
  • 致密层的激活功能:relu
  • _________________________________________________________________
    Layer (type)                 Output Shape              Param #   
    =================================================================
    input_1 (InputLayer)         (None, 23, 178)           0         
    _________________________________________________________________
    gru (GRU)                    (None, 59)                42126     
    _________________________________________________________________
    repeat_vector (RepeatVector) (None, 23, 59)            0         
    _________________________________________________________________
    gru_1 (GRU)                  (None, 23, 178)           127092    
    _________________________________________________________________
    time_distributed (TimeDistri (None, 23, 178)           31862     
    =================================================================
    

    • 优化程序:sgd
    • 损失:毫秒
    • gru层的激活功能:relu
    • 在使用这些循环层时,我是否在某些假设下犯了一些巨大的错误?或者您对如何调试它有什么建议?

      Am I doing some huge error over certain assumptions while using those recurrent layers? Or would you have some suggests on how to debug this?

      推荐答案

      您上面拥有的2个模型在某种意义上似乎并不具有可比性.第一个模型正在尝试压缩178个值的向量.这些向量很可能包含一些冗余信息,因此有理由假设您将能够对其进行压缩.

      The 2 models you have above do not seem to be comparable, in a meaningful way. The first model is attempting to compress your vector of 178 values. It is quite possible that these vectors contain some redundant information so it is reasonable to assume that you will be able to compress them.

      第二个模型正在尝试通过单个GRU层压缩23 x 178个向量的序列.这是一个具有大量参数的任务.重复矢量仅获取第一GRU层(编码器)的输出,并将其输入到第二GRU层(解码器)的输入中.但是随后您需要使用解码器的单个值.建议您在第二个GRU(解码器)中使用return_sequences=True而不是TimeDistributed层.否则,您是在说23x178序列是由所有具有相同值的元素构成的.必须导致很高的错误/无法解决.

      The second model is attempting to compress a sequence of 23 x 178 vectors via single GRU layer. This is a task with a significantly higher number of parameters. The repeat vector simply takes the output of the 1st GRU layer (the encoder) and makes it in input of the 2nd GRU layer (the decoder). But then you take a single value of the decoder. Instead of the TimeDistributed layer, I'd recommend that you use return_sequences=True in the 2nd GRU (decoder). Otherwise you are saying that you are expecting that the 23x178 sequence is constituted with elements all with the same value; that has to lead to a very high error / no solution.

      我建议您退后一步.您的目标是寻找序列之间的相似性吗?还是能够做出预测?对于相似性任务,最好使用自动编码器方法.为了做出预测,我建议您更着重于在序列步骤的输出中应用Dense(1)层的方法.

      I'd recommend you take a step back. Is your goal to find similarity between the sequences ? Or to be able to make predictions ? An auto-encoder approach is preferable for a similarity task. In order to make predictions, I'd recommend that you go more towards an approach where you apply a Dense(1) layer to the output of the sequences step.

      您的数据集是否打开?可用的 ?如果可以的话,我很想尝试一下.

      Is your data-set open ? available ? I'd be curious on taking it for a spin if that would be possible.

      这篇关于LSTM/GRU自动编码器收敛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆