LSTM自动编码器始终返回输入序列的平均值 [英] LSTM autoencoder always returns the average of the input sequence

查看:181
本文介绍了LSTM自动编码器始终返回输入序列的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用PyTorch构建一个非常简单的LSTM自动编码器.我总是用相同的数据训练它:

I'm trying to build a very simple LSTM autoencoder with PyTorch. I always train it with the same data:

x = torch.Tensor([[0.0], [0.1], [0.2], [0.3], [0.4]])

我已经通过链接建立了模型:

I have built my model following this link:

inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)

sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

我的代码正在运行,没有错误,但是y_pred收敛到:

My code is running with no errors but y_pred converge to:

tensor([[[0.2]],
        [[0.2]],
        [[0.2]],
        [[0.2]],
        [[0.2]]], grad_fn=<StackBackward>)

这是我的代码:

import torch
import torch.nn as nn
import torch.optim as optim


class LSTM(nn.Module):

    def __init__(self, input_dim, latent_dim, batch_size, num_layers):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.latent_dim = latent_dim
        self.batch_size = batch_size
        self.num_layers = num_layers

        self.encoder = nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)

        self.decoder = nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)

    def init_hidden_encoder(self):
        return (torch.zeros(self.num_layers, self.batch_size, self.latent_dim),
                torch.zeros(self.num_layers, self.batch_size, self.latent_dim))

    def init_hidden_decoder(self):
        return (torch.zeros(self.num_layers, self.batch_size, self.input_dim),
                torch.zeros(self.num_layers, self.batch_size, self.input_dim))

    def forward(self, input):
        # Reset hidden layer
        self.hidden_encoder = self.init_hidden_encoder()
        self.hidden_decoder = self.init_hidden_decoder()

        # Reshape input
        input = input.view(len(input), self.batch_size, -1)

        # Encode
        encoded, self.hidden = self.encoder(input, self.hidden_encoder)
        encoded = encoded[-1].repeat(5, 1, 1)

        # Decode
        y, self.hidden = self.decoder(encoded, self.hidden_decoder)
        return y


model = LSTM(input_dim=1, latent_dim=20, batch_size=1, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)

x = torch.Tensor([[0.0], [0.1], [0.2], [0.3], [0.4]])

while True:
    y_pred = model(x)
    optimizer.zero_grad()
    loss = loss_function(y_pred, x)
    loss.backward()
    optimizer.step()
    print(y_pred)

推荐答案

1.初始化隐藏状态

在源代码中,您使用init_hidden_encoderinit_hidden_decoder函数在每个正向传递中将两个循环单元的隐藏状态归零.

1. Initializing hidden states

In your source code you are using init_hidden_encoder and init_hidden_decoder functions to zero hidden states of both recurrent units in every forward pass.

在PyTorch中,您不必这样做,如果没有将初始隐藏状态传递给RNN单元(PyTorch中当前默认可用的状态是LSTM,GRU或RNN) ,它隐式地填充了零.

In PyTorch you don't have to do that, if no initial hidden state is passed to RNN-cell (be it LSTM, GRU or RNN from the ones currently available by default in PyTorch), it is implicitly fed with zeroes.

因此,为了获得与初始解决方案相同的代码(简化了下一部分),我将废弃不需要的部分,这使我们得到了如下所示的模型:

So, to obtain the same code as your initial solution (which simplifies next parts), I will scrap unneeded parts, which leaves us with the model seen below:

class LSTM(nn.Module):
    def __init__(self, input_dim, latent_dim, num_layers):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.latent_dim = latent_dim
        self.num_layers = num_layers

        self.encoder = nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)

        self.decoder = nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)

    def forward(self, input):
        # Encode
        _, (last_hidden, _) = self.encoder(input)
        encoded = last_hidden.repeat(5, 1, 1)

        # Decode
        y, _ = self.decoder(encoded)
        return torch.squeeze(y)

torch.squeeze的添加

我们不需要任何多余的尺寸(例如[5,1,1]中的1). 实际上,这就是您得出等于0.2的结果的线索

Addition of torch.squeeze

We don't need any superfluous dimensions (like the 1 in [5,1,1]). Actually, it's the clue to your results equal to 0.2

此外,我不考虑网络的输入重塑(我认为网络应该接受准备好处理的输入),以严格区分这两个任务(输入准备和模型本身).

Furthermore, I left input reshape out of the network (in my opinion, network should be fed with input ready to be processed), to separate strictly both tasks (input preparation and model itself).

这种方法为我们提供了以下设置代码和培训循环:

model = LSTM(input_dim=1, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)

y = torch.Tensor([[0.0], [0.1], [0.2], [0.3], [0.4]])
# Sequence x batch x dimension
x = y.view(len(y), 1, -1)

while True:
    y_pred = model(x)
    optimizer.zero_grad()
    loss = loss_function(y_pred, y)
    loss.backward()
    optimizer.step()
    print(y_pred)

整个网络与您现在的网络相同,只是它更简洁易读.

Whole network is identical to yours (for now), except it is more succinct and readable.

正如您提供的 Keras 代码所表明的那样,我们要做的(实际上您是在正确地进行操作)是从编码器获取最后一个隐藏状态(我们的整个序列),然后从该状态解码该序列以获得原始序列.

As your provided Keras code indicates, what we want to do (and actually you are doing it correctly) is to obtain last hiddden state from the encoder (it encodes our entire sequence) and decode the sequence from this state to obtain the original one.

顺便说一句.这种方法简称为序列排序 seq2seq (通常在语言翻译等任务中使用).好吧,也许是这种方法的一种变体,但无论如何我都会将其归类.

BTW. this approach is called sequence to sequence or seq2seq for short (often used in tasks like language translation). Well, maybe a variation of that approach, but I would classify it as that anyway.

PyTorch为我们提供了最后一个隐藏状态,作为RNN族的单独返回变量. 我建议您不要使用encoded[-1].原因是双向和多层方法.假设您想对双向输出求和,这意味着沿着这些行的代码

PyTorch provides us the last hidden state as a separate return variable from RNNs family. I would advise against yours encoded[-1]. The reason for it would be bidirectional and multilayered approach. Say, you wanted to sum bidirectional output, it would mean a code along those lines

# batch_size and hidden_size should be inferred cluttering the code further    
encoded[-1].view(batch_size, 2, hidden_size).sum(dim=1)

这就是为什么使用_, (last_hidden, _) = self.encoder(input)行的原因.

And that's why the line _, (last_hidden, _) = self.encoder(input) was used.

实际上,这只是您的错误,仅在最后一部分是错误的.

Actually, it was a mistake on your side and only in the last part.

输出预测和目标的形状:

Output shapes of your predictions and targets:

# Your output
torch.Size([5, 1, 1])
# Your target
torch.Size([5, 1])

如果提供了这些形状,则默认情况下, MSELoss 使用参数size_average=True.是的,它可以平均目标和输出,从本质上计算出张量的平均值(开始时约为2.5)和目标的平均值(即 0.2 )的损失.

If those shapes are provided, MSELoss, by default, uses argument size_average=True. And yes, it averages your targets and your output, which essentially calculates loss for the average of your tensor (around 2.5 at the beginning) and average of your target which is 0.2.

因此网络可以正确收敛,但是您的目标是错误的.

MSELoss 提供参数reduction ="sum",尽管它确实是临时的并且意外地起作用. 首先,网络会尝试使所有输出等于总和(0 + 0.1 + 0.2 + 0.3 + 0.4 = 1.0),首先是半随机输出,过一会后它将收敛到您想要的内容,但并非出于您想要的原因!.

Provide MSELoss with argument reduction="sum", though it's really temporary and works accidentally. Network, at first, will try to get all of the outputs to be equal to sum (0 + 0.1 + 0.2 + 0.3 + 0.4 = 1.0), at first with semi-random outputs, after a while it will converge to what you want, but not for the reasons you want!.

在这里,即使是求和(由于您的输入数据非常简单),身份功能也是最简单的选择.

Identity function is the easiest choice here, even for summation (as your input data is really simple).

只需将适当的形状传递给损失函数,例如batch x outputs,对于您来说,最后一部分看起来像这样:

Just pass appropriate shapes to loss function, e.g. batch x outputs, in your case, the final part would look like this:

model = LSTM(input_dim=1, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters())

y = torch.Tensor([0.0, 0.1, 0.2, 0.3, 0.4])
x = y.view(len(y), 1, -1)

while True:
    y_pred = model(x)
    optimizer.zero_grad()
    loss = loss_function(y_pred, y)
    loss.backward()
    optimizer.step()
    print(y_pred)

您的目标是一维的(因为批次的大小为1),您的输出也是一维的(在压缩了不必要的尺寸之后).

Your target is one dimensional (as batch is of size 1) and so is your output (after squeezing unnecessary dimensions).

我将Adam的参数更改为默认值,因为它收敛得更快.

I changed Adam's parameters to defaults as it converges faster that way.

为简便起见,下面是代码和结果:

For brevity, here is the code and results:

import torch
import torch.nn as nn
import torch.optim as optim


class LSTM(nn.Module):
    def __init__(self, input_dim, latent_dim, num_layers):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.latent_dim = latent_dim
        self.num_layers = num_layers

        self.encoder = nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)

        self.decoder = nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)

    def forward(self, input):
        # Encode
        _, (last_hidden, _) = self.encoder(input)
        # It is way more general that way
        encoded = last_hidden.repeat(input.shape)

        # Decode
        y, _ = self.decoder(encoded)
        return torch.squeeze(y)


model = LSTM(input_dim=1, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters())

y = torch.Tensor([0.0, 0.1, 0.2, 0.3, 0.4])
x = y.view(len(y), 1, -1)

while True:
    y_pred = model(x)
    optimizer.zero_grad()
    loss = loss_function(y_pred, y)
    loss.backward()
    optimizer.step()
    print(y_pred)

这是〜60k步之后的结果(实际上,它停留在〜20k步之后,您可能需要改进优化并尝试隐藏大小以获得更好的结果):

And here are the results after ~60k steps (it is stuck after ~20k steps actually, you may want to improve your optimization and play around with hidden size for better results):

step=59682                       
tensor([0.0260, 0.0886, 0.1976, 0.3079, 0.3962], grad_fn=<SqueezeBackward0>)

此外, L1Loss (又名在这种情况下,绝对错误)可能会获得更好的结果:

Additionally, L1Loss (a.k.a Mean Absolute Error) may get better results in this case:

step=10645                        
tensor([0.0405, 0.1049, 0.1986, 0.3098, 0.4027], grad_fn=<SqueezeBackward0>)

此网络的调整和正确批处理工作留给您,希望您现在会有所乐趣,并了解到这一点. :)

Tuning and correct batching of this network is left for you, hope you'll have some fun now and you get the idea. :)

PS.我会重复整个输入序列的形状,因为这是更通用的方法,应该可以在批量生产和更大尺寸的情况下使用.

PS. I repeat entire shape of input sequence, as it's more general approach and should work with batches and more dimensions out of the box.

这篇关于LSTM自动编码器始终返回输入序列的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆