学习期间将数据添加到自动编码器中的解码器 [英] adding data to decoder in autoencoder during learning

查看:59
本文介绍了学习期间将数据添加到自动编码器中的解码器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Keras实现一个自动编码器,这种结构是一个大型网络,在自动编码器的输出上进行了一些操作,然后我们应该考虑两个损失,我附上了一张显示我提出的结构的图像.链接在下面.

I want to implement an autoencoder using Keras and this structure is a large network that some operations is done on the output of autoencoder and then we should consider two loss I attached an image that shows my proposed structure. the link is below.

自动编码器结构

w具有与输入图像相同的大小,在此自动编码器中,我不使用最大池,因此每个相位的输出与输入图像具有相同的大小.我想将w和潜在空间表示发送到解码器部分,然后在将噪声添加到解码器输出后,尝试使用网络的第三部分提取w.所以我需要我的损失函数考虑输入图像和潜在空间表示之间以及w和w'之间的差异.但是我在执行过程中遇到了一些问题.由于使用此行,我不知道如何将w添加到解码器输出"merge_encoded_w = cv2.merge(encoded,w) 会产生错误,无法正常工作.我不确定我的损失函数是否基于我的需要?请帮助我使用此代码.我是初学者,发现解决方案对我来说很困难.我问了这个问题以前,但没有人帮我解决这个问题,请指导我.我的代码如下:

w has the same size as the input image and in this autoencoder, I do not use max pooling so the output of each phase has the same size as the input image. I want to send w and latent space representation to decoder part and then after adding a noise to the decoder output try to extract w using third part of the network. so I need that my loss function considers the difference between the input image and latent space representation and also between w and w'. but I have several problems with implementation. I do not know how can I add w to the decoder output, due to using this line "merge_encoded_w=cv2.merge(encoded,w) " produce an error and does not work. I do not sure my loss function is true based on what I need or not?please help me with this code. I am a beginner and finding the solution is difficult for me. I asked this question before but no one help me with this. please guide me. my code is as below:

from keras.models import Sequential
from keras.layers import Input, Dense, Dropout, Activation,UpSampling2D,Conv2D, MaxPooling2D, GaussianNoise
from keras.models import Model
from keras.optimizers import SGD
from keras.datasets import mnist
from keras import regularizers
from keras import backend as K
import keras as k
import numpy as np
import matplotlib.pyplot as plt
import cv2
from time import time
from keras.callbacks import TensorBoard
# Embedding phase
##encoder

w=np.random.random((1, 28,28))
input_img = Input(shape=(28, 28, 1))  # adapt this if using `channels_first` image data format

x = Conv2D(8, (5, 5), activation='relu', padding='same')(input_img)
#x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
#x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(2, (3, 3), activation='relu', padding='same')(x)
encoded = Conv2D(1, (3, 3), activation='relu', padding='same')(x)
merge_encoded_w=cv2.merge(encoded,w)
#
#decoder

x = Conv2D(2, (5, 5), activation='relu', padding='same')(merge_encoded_w)
#x = UpSampling2D((2, 2))(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
#x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu',padding='same')(x)
#x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

#Extraction phase
decodedWithNois=k.layers.GaussianNoise(0.5)(decoded)
x = Conv2D(8, (5, 5), activation='relu', padding='same')(decodedWithNois)
#x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
#x = MaxPooling2D((2, 2), padding='same')(x)
final_image_watermark = Conv2D(2, (3, 3), activation='relu', padding='same')(x)


autoencoder = Model([input_img,w], [decoded,final_image_watermark(2)])
encoder=Model(input_img,encoded)
autoencoder.compile(optimizer='adadelta', loss=['mean_squared_error','mean_squared_error'],metrics=['accuracy'])
(x_train, _), (x_test, _) = mnist.load_data()
x_validation=x_train[1:10000,:,:]
x_train=x_train[10001:60000,:,:]
#
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_validation = x_validation.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))  # adapt this if using `channels_first` image data format
x_validation = np.reshape(x_validation, (len(x_validation), 28, 28, 1))  # adapt this if using `channels_first` image data format
autoencoder.fit(x_train, x_train,
                epochs=5,
                batch_size=128,
                shuffle=True,
                validation_data=(x_validation, x_validation),
                callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])

decoded_imgs = autoencoder.predict(x_test)
encoded_imgs=encoder.predict(x_test)

推荐答案

对于这种大型体系结构,我建议您从小块开始构建,然后将它们组合在一起.首先,编码器部分.它接收大小为(28,28,1)的图像,并返回形状为(28,28,1)的编码图像.

For this kind of large architecture, I suggest you build from small pieces, then put the pieces together. First, encoder part. It receives an image of size (28,28,1) and returns the encoded image of shape (28,28,1).

from keras.layers import Input, Concatenate, GaussianNoise
from keras.layers import Conv2D
from keras.models import Model

def make_encoder():
    image = Input((28, 28, 1))
    x = Conv2D(8, (5, 5), activation='relu', padding='same')(image)
    x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
    x = Conv2D(2, (3, 3), activation='relu', padding='same')(x)
    encoded =  Conv2D(1, (3, 3), activation='relu', padding='same')(x)

    return Model(inputs=image, outputs=encoded)
encoder = make_encoder()
encoder.summary()

#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#input_1 (InputLayer)         (None, 28, 28, 1)         0         
#_________________________________________________________________
#conv2d_1 (Conv2D)            (None, 28, 28, 8)         208       
_________________________________________________________________
#conv2d_2 (Conv2D)            (None, 28, 28, 4)         292       
#_________________________________________________________________
#conv2d_3 (Conv2D)            (None, 28, 28, 2)         74        
#_________________________________________________________________
#conv2d_4 (Conv2D)            (None, 28, 28, 1)         19        
#=================================================================
#Total params: 593
#Trainable params: 593
#Non-trainable params: 0
#_________________________________________________________________

形状过渡符合理论.
接下来,解码器部分将编码后的图像与另一个形状为(28, 28, 2)的数组合并,最后恢复形状为(28,28,1)的原始图像.

The shape transition matches the theory.
Next, the decoder part takes encoded merged with another array, shape (28, 28, 2) and finally recovers original image, shape (28, 28, 1).

def make_decoder():
    encoded_merged = Input((28, 28, 2))
    x = Conv2D(2, (5, 5), activation='relu', padding='same')(encoded_merged)
    x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
    x = Conv2D(8, (3, 3), activation='relu',padding='same')(x)
    decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) 

    return Model(inputs=encoded_merged, outputs=decoded)
decoder = make_decoder()
decoder.summary()

#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#input_2 (InputLayer)         (None, 28, 28, 2)         0         
#_________________________________________________________________
#conv2d_5 (Conv2D)            (None, 28, 28, 2)         102       
#_________________________________________________________________
#conv2d_6 (Conv2D)            (None, 28, 28, 4)         76        
#_________________________________________________________________
#conv2d_7 (Conv2D)            (None, 28, 28, 8)         296       
#_________________________________________________________________
#conv2d_8 (Conv2D)            (None, 28, 28, 1)         73        
#=================================================================
#Total params: 547
#Trainable params: 547
#Non-trainable params: 0
#_________________________________________________________________

然后,模型也尝试恢复W数组.输入是重建的图像加上噪声(形状为(28, 28, 1)).

The model then tries to recover the W array as well. Input is the reconstructed image plus noise (shape is (28, 28, 1)) .

def make_w_predictor():
    decoded_noise = Input((28, 28, 1))
    x = Conv2D(8, (5, 5), activation='relu', padding='same')(decoded_noise)
    x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
    pred_w = Conv2D(1, (3, 3), activation='relu', padding='same')(x)  
    # reconsider activation (is W positive?)
    # should be filter=1 to match W
    return Model(inputs=decoded_noise, outputs=pred_w)

w_predictor = make_w_predictor()
w_predictor.summary()

#_________________________________________________________________
#Layer (type)                 Output Shape              Param #   
#=================================================================
#input_3 (InputLayer)         (None, 28, 28, 1)         0         
#_________________________________________________________________
#conv2d_9 (Conv2D)            (None, 28, 28, 8)         208       
#_________________________________________________________________
#conv2d_10 (Conv2D)           (None, 28, 28, 4)         292       
#_________________________________________________________________
#conv2d_11 (Conv2D)           (None, 28, 28, 1)         37        
#=================================================================
#Total params: 537
#Trainable params: 537
#Non-trainable params: 0
#_________________________________________________________________

手握所有零件,将零件组合在一起以构建整个模型并不难.请注意,您上面构建的模型可以像图层一样使用.

With all pieces at hand, putting pieces together to build the entire model is not so hard. Notice that the models you built above can be used like layers.

def put_together(encoder, decoder, w_predictor):
    image = Input((28, 28, 1))
    w = Input((28, 28, 1))
    encoded = encoder(image)

    encoded_merged = Concatenate(axis=3)([encoded, w])
    decoded = decoder(encoded_merged)

    decoded_noise = GaussianNoise(0.5)(decoded)
    pred_w = w_predictor(decoded_noise)

    return Model(inputs=[image, w], outputs=[decoded, pred_w])

model = put_together(encoder, decoder, w_predictor)
model.summary()

#__________________________________________________________________________________________________
#Layer (type)                    Output Shape         Param #     Connected to                     
#==================================================================================================
#input_4 (InputLayer)            (None, 28, 28, 1)    0                                            
#__________________________________________________________________________________________________
#model_1 (Model)                 (None, 28, 28, 1)    593         input_4[0][0]                    
#__________________________________________________________________________________________________
#input_5 (InputLayer)            (None, 28, 28, 1)    0                                            
#__________________________________________________________________________________________________
#concatenate_1 (Concatenate)     (None, 28, 28, 2)    0           model_1[1][0]                    
#                                                                 input_5[0][0]                    
#__________________________________________________________________________________________________
#model_2 (Model)                 (None, 28, 28, 1)    547         concatenate_1[0][0]              
#__________________________________________________________________________________________________
#gaussian_noise_1 (GaussianNoise (None, 28, 28, 1)    0           model_2[1][0]                    
#__________________________________________________________________________________________________
#model_3 (Model)                 (None, 28, 28, 1)    537         gaussian_noise_1[0][0]           
#==================================================================================================
#Total params: 1,677
#Trainable params: 1,677
#Non-trainable params: 0
#__________________________________________________________________________________________________

以下代码使用伪数据训练模型.当然,只要形状匹配就可以使用.

Code below trains the model with dummy data. Of course, you can use your own so long as the shape matches.

import numpy as np

# dummy data
images = np.random.random((1000, 28, 28, 1))
w = np.random.lognormal(size=(1000, 28, 28, 1))

# is accuracy sensible metric for this model?
model.compile(optimizer='adadelta', loss='mse', metrics=['accuracy'])
model.fit([images, w], [images, w], batch_size=64, epochs=5)

下面的编辑

我对您在此处输入的代码有一些疑问.在make_w_预测变量中,您说过:#重新考虑激活(W为正吗?)#应该为filter = 1以匹配W"是什么意思? W是一个包含0和1的数组.我应该更改该部分的代码是什么意思重新考虑激活"?

I have some questions about the code that you put here. in the make_w_ predictor, you said:" # reconsider activation (is W positive?) # should be filter=1 to match W" what does it mean? W is an array that contains 0 and 1. what does it mean " reconsider activation" should I change the code for this part?

relu激活会在[0,+ inf)中返回正数,因此,如果W采用一组不同的值,则可能不是一个好的选择.典型的选择如下.

relu activation returns positive numbers in [0, +inf), so it may not be a good choice if W takes different set of values. Typical choice would be the following.

  • W可以是正数,也可以是负数:线性"激活.
  • W在[0,1]中:"S型"激活.
  • [-1,1]中的
  • W:"tanh"激活.
  • W是正数:"relu"激活.
  • W can be positive and negative numbers: "linear" activation.
  • W in [0, 1]: "sigmoid" activation.
  • W in [-1, 1]: "tanh" activation.
  • W is positive number: "relu" activation.

在原始代码中,您有:

w=np.random.random((1, 28, 28))

,其值介于0到1之间.因此,我建议从"relu"切换为"Sigmoid".但是我没有更改代码示例,因为我不确定这是否有意.

which takes values between 0 and 1. So I suggested to switch from "relu" to "sigmoid". But I did not change in my code sample because I was not sure if this was intended.

您说过滤器应该为1,这意味着将(3,3)更改为(1,1)?这些问题我很抱歉.但是我是一个初学者,我找不到您说的其中一些内容.您能帮我一下,向我完整解释一下吗?

you said the filter should be 1 it means change (3,3) to (1,1)? I am so sorry for these questions. but I am a beginner and I can not find some of these that you say. can you please help me and explain me completely.

我在原始问题中提到了这一行:

I refer to this line in the original question:

final_image_watermark = Conv2D(2, (3, 3), activation='relu', padding='same')(x)

如果我理解正确的话,这将在所附图像中定义W',该图像应预测W,其大小为(28, 28, 1).然后,Conv2D的第一个参数应该为1.否则,输出形状将变为(28, 28, 2).我在代码示例中进行了此更改,因为否则会发出形状不匹配错误:

If I understand correct, this defines W' in the attached image, which should predict W and its size is (28, 28, 1). Then the first argument to the Conv2D should be one. Otherwise the output shape becomes (28, 28, 2). I made this change in my code sample because otherwise it emits shape mismatch error:

pred_w = Conv2D(1, (3, 3), activation='relu', padding='same')(x)

我认为keras中的(3, 3)部分(kernel size)就可以了.

I think (3, 3) part, kernel size in keras, is fine as is.

这篇关于学习期间将数据添加到自动编码器中的解码器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆