正确训练堆叠自动编码器 [英] Train Stacked Autoencoder Correctly

查看:35
本文介绍了正确训练堆叠自动编码器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试在 Keras (tf.keras) 中构建一个堆叠式自动编码器.我所说的堆积并不是指.我为 Keras 找到的所有示例都在生成,例如3 个编码器层,3 个解码器层,他们训练它并称它为一天.但是,似乎训练 Stacked 自动编码器 (SAE) 的正确方法是本文中描述的方法:

这就是我的麻烦开始的地方.如何告诉 Keras 在哪些层上使用损失函数?

这就是我所做的.由于自动编码器模块在 Keras 中不再存在,我构建了第一个自动编码器,并将其编码器的权重(trainable = False)设置在第二个自动编码器的第一层中,总共有 2 层.然后当我训练它时,它显然将重建层 out_s2 与输入层 in_s 进行比较,而不是第 1 层 hid1.

# 自动编码器层 1in_s = tf.keras.Input(shape=(input_size,))噪声 = tf.keras.layers.Dropout(0.1)(in_s)hid = tf.keras.layers.Dense(nodes[0], activation='relu')(noise)out_s = tf.keras.layers.Dense(input_size, activation='sigmoid')(hid)ae_1 = tf.keras.Model(in_s, out_s, name="ae_1")ae_1.compile(optimizer='nadam', loss='binary_crossentropy', metrics=['acc'])# 自编码器第 2 层hid1 = tf.keras.layers.Dense(nodes[0], activation='relu')(in_s)噪声 = tf.keras.layers.Dropout(0.1)(hid1)hid2 = tf.keras.layers.Dense(nodes[1], activation='relu')(noise)out_s2 = tf.keras.layers.Dense(nodes[0], activation='sigmoid')(hid2)ae_2 = tf.keras.Model(in_s, out_s2, name="ae_2")ae_2.layers[0].set_weights(ae_1.layers[0].get_weights())ae_2.layers[0].trainable = Falseae_2.compile(optimizer='nadam', loss='binary_crossentropy', metrics=['acc'])

解决方案应该相当简单,但我在网上看不到它,也找不到它.我如何在 Keras 中做到这一点?

解决方案

通过查看评论,问题似乎已经过时.但我仍然会回答这个问题,因为这个问题中提到的用例不仅特定于自动编码器,而且可能对其他一些情况有所帮助.

因此,当您说逐层训练整个网络"时,我宁愿将其解释为按序列训练只有一层的小型网络".

看这个问题贴出的代码,好像OP已经搭建了小型网络.但这两个网络都不是由一层组成.

这里的第二个自动编码器将第一个自动编码器的输入作为输入.但是,它实际上应该将第一个自动编码器的输出作为输入.

那么,你训练第一个自动编码器并在训练后收集它的预测.然后训练第二个自动编码器,它将第一个自动编码器的输出(预测)作为输入.

现在让我们关注这一部分:第 1 层训练完成后,将其用作训练第 2 层的输入.重建损失应与第 1 层而不是输入层进行比较."

由于网络将第 1 层的输出(在 OP 的情况下为自动编码器 1)的输出作为输入,它将与它的输出进行比较.任务完成.

但要实现这一点,您需要编写问题中提供的代码中缺少的 model.fit(...) 行.

此外,如果您希望模型计算输入层的损失,您只需将 model,fit(...) 中的 y 参数替换为自编码器1的输入.

简而言之,您只需要将这些自编码器解耦为一个单层的微型网络,然后根据需要训练它们.现在不需要使用 trainable = False,或者随心所欲地使用它.

I try to build a Stacked Autoencoder in Keras (tf.keras). By stacked I do not mean deep. All the examples I found for Keras are generating e.g. 3 encoder layers, 3 decoder layers, they train it and they call it a day. However, it seems the correct way to train a Stacked Autoencoder (SAE) is the one described in this paper: Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

In short, a SAE should be trained layer-wise as shown in the image below. After layer 1 is trained, it's used as input to train layer 2. The reconstruction loss should be compared with the layer 1 and not the input layer.

And here is where my trouble begins. How to tell Keras which layers to use the loss function on?

Here is what I do. Since the Autoencoder module is not existed anymore in Keras, I build the first autoencoder, and I set its encoder's weights (trainable = False) in the 1st layer of a second autoencoder with 2 layers in total. Then when I train that, it obviously compares the reconstructed layer out_s2 with the input layer in_s, instead of the layer 1 hid1.

# autoencoder layer 1
in_s = tf.keras.Input(shape=(input_size,))
noise = tf.keras.layers.Dropout(0.1)(in_s)
hid = tf.keras.layers.Dense(nodes[0], activation='relu')(noise)
out_s = tf.keras.layers.Dense(input_size, activation='sigmoid')(hid)

ae_1 = tf.keras.Model(in_s, out_s, name="ae_1")
ae_1.compile(optimizer='nadam', loss='binary_crossentropy', metrics=['acc'])

# autoencoder layer 2
hid1 = tf.keras.layers.Dense(nodes[0], activation='relu')(in_s)
noise = tf.keras.layers.Dropout(0.1)(hid1)
hid2 = tf.keras.layers.Dense(nodes[1], activation='relu')(noise)
out_s2 = tf.keras.layers.Dense(nodes[0], activation='sigmoid')(hid2)

ae_2 = tf.keras.Model(in_s, out_s2, name="ae_2")
ae_2.layers[0].set_weights(ae_1.layers[0].get_weights())
ae_2.layers[0].trainable = False

ae_2.compile(optimizer='nadam', loss='binary_crossentropy', metrics=['acc'])

The solution should be fairly easy, but I can't see it nor find it online. How do I do that in Keras?

解决方案

It seems like the question is outdated by looking at the comments. But I'll still answer this as the use-case mentioned in this question is not just specific to autoencoders and might be helpful for some other cases.

So, when you say "train the whole network layer by layer", I would rather interpret it as "train small networks with one single layer in a sequence".

Looking at the code posted in this question, it seems that the OP has already built small networks. But both these networks do not consist of one single layer.

The second autoencoder here, takes as input the input of first autoencoder. But, it should actually take as input, the output of first autoencoder.

So then, you train the first autoencoder and collect it's predicitons after it is trained. Then you train the second autoencoder, which takes as input the output (predictions) of first autoencoder.

Now let's focus on this part: "After layer 1 is trained, it's used as input to train layer 2. The reconstruction loss should be compared with the layer 1 and not the input layer."

Since the network takes as input the output of layer 1 (autoencoder 1 in OP's case), it will be comparing it's output with this. The task is achieved.

But to achieve this, you will need to write the model.fit(...) line which is missing in the code provided in the question.

Also, just in case you want the model to calculate loss on input layer, you simply replace the y parameter in model,fit(...) to the input of autoencoder 1.

In short, you just need to decouple these autoencoders into tiny networks with one single layer and then train them as you wish. No need to use trainable = False now, or else use it as you wish.

这篇关于正确训练堆叠自动编码器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆