LSTM 自动编码器总是返回输入序列的平均值 [英] LSTM autoencoder always returns the average of the input sequence

查看:23
本文介绍了LSTM 自动编码器总是返回输入序列的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 PyTorch 构建一个非常简单的 LSTM 自动编码器.我总是用相同的数据训练它:

I'm trying to build a very simple LSTM autoencoder with PyTorch. I always train it with the same data:

x = torch.Tensor([[0.0], [0.1], [0.2], [0.3], [0.4]])

我按照这个链接构建了我的模型:

I have built my model following this link:

inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)

decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)

sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)

我的代码运行没有错误,但 y_pred 收敛到:

My code is running with no errors but y_pred converge to:

tensor([[[0.2]],
        [[0.2]],
        [[0.2]],
        [[0.2]],
        [[0.2]]], grad_fn=<StackBackward>)

这是我的代码:

import torch
import torch.nn as nn
import torch.optim as optim


class LSTM(nn.Module):

    def __init__(self, input_dim, latent_dim, batch_size, num_layers):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.latent_dim = latent_dim
        self.batch_size = batch_size
        self.num_layers = num_layers

        self.encoder = nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)

        self.decoder = nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)

    def init_hidden_encoder(self):
        return (torch.zeros(self.num_layers, self.batch_size, self.latent_dim),
                torch.zeros(self.num_layers, self.batch_size, self.latent_dim))

    def init_hidden_decoder(self):
        return (torch.zeros(self.num_layers, self.batch_size, self.input_dim),
                torch.zeros(self.num_layers, self.batch_size, self.input_dim))

    def forward(self, input):
        # Reset hidden layer
        self.hidden_encoder = self.init_hidden_encoder()
        self.hidden_decoder = self.init_hidden_decoder()

        # Reshape input
        input = input.view(len(input), self.batch_size, -1)

        # Encode
        encoded, self.hidden = self.encoder(input, self.hidden_encoder)
        encoded = encoded[-1].repeat(5, 1, 1)

        # Decode
        y, self.hidden = self.decoder(encoded, self.hidden_decoder)
        return y


model = LSTM(input_dim=1, latent_dim=20, batch_size=1, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)

x = torch.Tensor([[0.0], [0.1], [0.2], [0.3], [0.4]])

while True:
    y_pred = model(x)
    optimizer.zero_grad()
    loss = loss_function(y_pred, x)
    loss.backward()
    optimizer.step()
    print(y_pred)

推荐答案

1.初始化隐藏状态

在您的源代码中,您使用 init_hidden_​​encoderinit_hidden_​​decoder 函数在每次前向传递中将两个循环单元的隐藏状态归零.

1. Initializing hidden states

In your source code you are using init_hidden_encoder and init_hidden_decoder functions to zero hidden states of both recurrent units in every forward pass.

在 PyTorch 中你不必这样做,如果没有初始隐藏状态传递给 RNN 单元(无论是 LSTM、GRU 还是来自 PyTorch 中当前默认可用的 RNN),它被隐式地填充为零.

In PyTorch you don't have to do that, if no initial hidden state is passed to RNN-cell (be it LSTM, GRU or RNN from the ones currently available by default in PyTorch), it is implicitly fed with zeroes.

因此,为了获得与您的初始解决方案相同的代码(简化接下来的部分),我将废弃不需要的部分,这给我们留下了如下所示的模型:

So, to obtain the same code as your initial solution (which simplifies next parts), I will scrap unneeded parts, which leaves us with the model seen below:

class LSTM(nn.Module):
    def __init__(self, input_dim, latent_dim, num_layers):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.latent_dim = latent_dim
        self.num_layers = num_layers

        self.encoder = nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)

        self.decoder = nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)

    def forward(self, input):
        # Encode
        _, (last_hidden, _) = self.encoder(input)
        encoded = last_hidden.repeat(5, 1, 1)

        # Decode
        y, _ = self.decoder(encoded)
        return torch.squeeze(y)

添加torch.squeeze

我们不需要任何多余的维度(例如 [5,1,1] 中的 1).其实就是你的结果等于0.2的线索

Addition of torch.squeeze

We don't need any superfluous dimensions (like the 1 in [5,1,1]). Actually, it's the clue to your results equal to 0.2

此外,我将输入重塑排除在网络之外(在我看来,网络应该输入准备处理的输入),以严格分离两个任务(输入准备和模型本身).

Furthermore, I left input reshape out of the network (in my opinion, network should be fed with input ready to be processed), to separate strictly both tasks (input preparation and model itself).

这种方法为我们提供了以下设置代码和训练循环:

model = LSTM(input_dim=1, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001)

y = torch.Tensor([[0.0], [0.1], [0.2], [0.3], [0.4]])
# Sequence x batch x dimension
x = y.view(len(y), 1, -1)

while True:
    y_pred = model(x)
    optimizer.zero_grad()
    loss = loss_function(y_pred, y)
    loss.backward()
    optimizer.step()
    print(y_pred)

整个网络与您的(目前)完全相同,只是它更简洁易读.

Whole network is identical to yours (for now), except it is more succinct and readable.

正如您提供的 Keras 代码所示,我们想要做的(实际上您做对了)是从编码器获取最后隐藏状态(它编码我们的整个序列)并从此状态解码序列以获得原始序列.

As your provided Keras code indicates, what we want to do (and actually you are doing it correctly) is to obtain last hiddden state from the encoder (it encodes our entire sequence) and decode the sequence from this state to obtain the original one.

顺便说一句.这种方法简称为序列到序列seq2seq(通常用于语言翻译等任务).好吧,也许是这种方法的一种变体,但无论如何我都会将其归类为那种.

BTW. this approach is called sequence to sequence or seq2seq for short (often used in tasks like language translation). Well, maybe a variation of that approach, but I would classify it as that anyway.

PyTorch 为我们提供了最后一个隐藏状态,作为来自 RNN 家族的独立返回变量.我建议不要使用你的 encoded[-1].其原因将是双向和多层方法.说,你想对双向输出求和,这意味着沿着这些行的代码

PyTorch provides us the last hidden state as a separate return variable from RNNs family. I would advise against yours encoded[-1]. The reason for it would be bidirectional and multilayered approach. Say, you wanted to sum bidirectional output, it would mean a code along those lines

# batch_size and hidden_size should be inferred cluttering the code further    
encoded[-1].view(batch_size, 2, hidden_size).sum(dim=1)

这就是使用 _, (last_hidden, _) = self.encoder(input) 行的原因.

And that's why the line _, (last_hidden, _) = self.encoder(input) was used.

实际上,这是你的错误,而且只是在最后一部分.

Actually, it was a mistake on your side and only in the last part.

预测和目标的输出形状:

Output shapes of your predictions and targets:

# Your output
torch.Size([5, 1, 1])
# Your target
torch.Size([5, 1])

如果提供了这些形状,MSELoss 默认使用参数 size_average=True.是的,它平均了你的目标和你的输出,这基本上计算了你的张量的平均值(开始时约为 2.5)和你的目标的平均值的损失,这是0.2.

If those shapes are provided, MSELoss, by default, uses argument size_average=True. And yes, it averages your targets and your output, which essentially calculates loss for the average of your tensor (around 2.5 at the beginning) and average of your target which is 0.2.

所以网络正确收敛,但你的目标是错误的.

MSELoss 提供参数reduction="sum",尽管它确实是临时的并且会意外工作.网络首先会尝试使所有输出等于总和 (0 + 0.1 + 0.2 + 0.3 + 0.4 = 1.0),首先使用半随机输出,一段时间后它会 收敛于您想要的,但不是出于您想要的原因!.

Provide MSELoss with argument reduction="sum", though it's really temporary and works accidentally. Network, at first, will try to get all of the outputs to be equal to sum (0 + 0.1 + 0.2 + 0.3 + 0.4 = 1.0), at first with semi-random outputs, after a while it will converge to what you want, but not for the reasons you want!.

在这里,恒等函数是最简单的选择,即使是求和(因为您的输入数据非常简单).

Identity function is the easiest choice here, even for summation (as your input data is really simple).

只需将适当的形状传递给损失函数,例如batch x 输出,在你的情况下,最后一部分看起来像这样:

Just pass appropriate shapes to loss function, e.g. batch x outputs, in your case, the final part would look like this:

model = LSTM(input_dim=1, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters())

y = torch.Tensor([0.0, 0.1, 0.2, 0.3, 0.4])
x = y.view(len(y), 1, -1)

while True:
    y_pred = model(x)
    optimizer.zero_grad()
    loss = loss_function(y_pred, y)
    loss.backward()
    optimizer.step()
    print(y_pred)

您的目标是一维的(因为批次的大小为 1),您的输出也是一维的(在压缩了不必要的维度之后).

Your target is one dimensional (as batch is of size 1) and so is your output (after squeezing unnecessary dimensions).

我将 Adam 的参数更改为默认值,因为这样收敛速度更快.

I changed Adam's parameters to defaults as it converges faster that way.

为简洁起见,这里是代码和结果:

For brevity, here is the code and results:

import torch
import torch.nn as nn
import torch.optim as optim


class LSTM(nn.Module):
    def __init__(self, input_dim, latent_dim, num_layers):
        super(LSTM, self).__init__()
        self.input_dim = input_dim
        self.latent_dim = latent_dim
        self.num_layers = num_layers

        self.encoder = nn.LSTM(self.input_dim, self.latent_dim, self.num_layers)

        self.decoder = nn.LSTM(self.latent_dim, self.input_dim, self.num_layers)

    def forward(self, input):
        # Encode
        _, (last_hidden, _) = self.encoder(input)
        # It is way more general that way
        encoded = last_hidden.repeat(input.shape)

        # Decode
        y, _ = self.decoder(encoded)
        return torch.squeeze(y)


model = LSTM(input_dim=1, latent_dim=20, num_layers=1)
loss_function = nn.MSELoss()
optimizer = optim.Adam(model.parameters())

y = torch.Tensor([0.0, 0.1, 0.2, 0.3, 0.4])
x = y.view(len(y), 1, -1)

while True:
    y_pred = model(x)
    optimizer.zero_grad()
    loss = loss_function(y_pred, y)
    loss.backward()
    optimizer.step()
    print(y_pred)

这里是大约 60k 步后的结果(实际上它在大约 20k 步后卡住,您可能需要改进优化并使用隐藏大小以获得更好的结果):

And here are the results after ~60k steps (it is stuck after ~20k steps actually, you may want to improve your optimization and play around with hidden size for better results):

step=59682                       
tensor([0.0260, 0.0886, 0.1976, 0.3079, 0.3962], grad_fn=<SqueezeBackward0>)

此外,L1Loss(又名平均绝对值Error) 在这种情况下可能会得到更好的结果:

Additionally, L1Loss (a.k.a Mean Absolute Error) may get better results in this case:

step=10645                        
tensor([0.0405, 0.1049, 0.1986, 0.3098, 0.4027], grad_fn=<SqueezeBackward0>)

这个网络的调整和正确批处理留给你了,希望你现在能玩得开心,你明白了.:)

Tuning and correct batching of this network is left for you, hope you'll have some fun now and you get the idea. :)

附注.我重复输入序列的整个形状,因为它是更通用的方法,应该可以处理批量和开箱即用的更多维度.

PS. I repeat entire shape of input sequence, as it's more general approach and should work with batches and more dimensions out of the box.

这篇关于LSTM 自动编码器总是返回输入序列的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆