Keras中的深度自动编码器将一个维度转换为另一个维度 [英] Deep autoencoder in Keras converting one dimension to another i

查看:453
本文介绍了Keras中的深度自动编码器将一个维度转换为另一个维度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用用于表示图像和字幕的矢量进行图像字幕处理.

I am doing an image captioning task using vectors for representing both images and captions.

字幕向量的长度/尺寸为128. 图像矢量的长度/尺寸为2048.

The caption vectors have a legth/dimension of size 128. The image vectors have a length/dimension of size 2048.

我想做的是训练一个自动编码器,以获得一个能够将文本向量转换为图像向量的编码器.以及能够将图像矢量转换为文本矢量的解码器.

What I want to do is to train an autoencoder, to get an encoder which is able to convert text vector into a image vector. And a decoder which is able to convert an image vector into a text vector.

编码器:128-> 2048.

Encoder: 128 -> 2048.

解码器:2048-> 128.

Decoder: 2048 -> 128.

我遵循了教程,以实现一个浅层网络来完成我的工作想要的.

I followed this tutorial to implement a shallow network doing what I wanted.

但是,按照同一教程,我无法弄清楚如何创建深层网络.

But I cant figure out how to create a deep network, following the same tutorial.

x_dim = 128
y_dim = 2048
x_dim_shape = Input(shape=(x_dim,))
encoded = Dense(512, activation='relu')(x_dim_shape)
encoded = Dense(1024, activation='relu')(encoded)
encoded = Dense(y_dim, activation='relu')(encoded)

decoded = Dense(1024, activation='relu')(encoded)
decoded = Dense(512, activation='relu')(decoded)
decoded = Dense(x_dim, activation='sigmoid')(decoded)

# this model maps an input to its reconstruction
autoencoder = Model(input=x_dim_shape, output=decoded)

# this model maps an input to its encoded representation
encoder = Model(input=x_dim_shape, output=encoded)

encoded_input = Input(shape=(y_dim,))
decoder_layer1 = autoencoder.layers[-3]
decoder_layer2 = autoencoder.layers[-2]
decoder_layer3 = autoencoder.layers[-1]

# create the decoder model
decoder = Model(input=encoded_input, output=decoder_layer3(decoder_layer2(decoder_layer1(encoded_input))))

autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')


autoencoder.fit(training_data_x, training_data_y,
                nb_epoch=50,
                batch_size=256,
                shuffle=True,
                validation_data=(test_data_x, test_data_y))

training_data_x和test_data_x具有128个维度. training_data_y和test_data_y具有2048个维度.

The training_data_x and test_data_x have 128 dimensions. The training_data_y and test_data_y have 2048 dimensions.

尝试执行此操作时收到以下错误:

The error I receive while trying to run this is the following:

异常:检查模型目标时出错:预期density_6具有形状(None,128),但数组的形状为(32360,2048)

Exception: Error when checking model target: expected dense_6 to have shape (None, 128) but got array with shape (32360, 2048)

dense_6是最后一个已解码的变量.

dense_6 is the last decoded variable.

推荐答案

自动编码器

如果您希望能够分别调用encoderdecoder,则需要执行的操作是完全按照本教程的要求训练整个自动编码器,并使用input_shape == output_shape(在您的情况下为== 128) ,然后您才能调用图层的子集:

Autoencoders

If you want is to be able to call the encoder and decoder separately, what you need to do is train the whole autoencoder exactly as per the tutorial, with input_shape == output_shape (== 128 in your case), and only then can you call a subset of the layers:

x_dim = 128
y_dim = 2048
x_dim_shape = Input(shape=(x_dim,))
encoded = Dense(512, activation='relu')(x_dim_shape)
encoded = Dense(1024, activation='relu')(encoded)
encoded = Dense(y_dim, activation='relu')(encoded)

decoded = Dense(1024, activation='relu')(encoded)
decoded = Dense(512, activation='relu')(decoded)
decoded = Dense(x_dim, activation='sigmoid')(decoded)

# this model maps an input to its reconstruction
autoencoder = Model(input=x_dim_shape, output=decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.fit(training_data_x, training_data_x, nb_epoch=50, batch_size=256, shuffle=True, validation_data=(test_data_x, test_data_y))

# test the decoder model
encoded_input = Input(shape=(y_dim,))
decoder_layer1 = autoencoder.layers[-3]
decoder_layer2 = autoencoder.layers[-2]
decoder_layer3 = autoencoder.layers[-1]

decoder = Model(input=encoded_input, output=decoder_layer3(decoder_layer2(decoder_layer1(encoded_input))))
decoder.compile(optimizer='adadelta', loss='binary_crossentropy')
eval = decoder.evaluate(test_data_y, test_data_x)
print('Decoder evaluation: {:.2f}'.format(eval))

请注意,在调用autoencoder.fit()时,请在参数中使用x == y.这是自动编码器(通常)必须优化瓶颈表示形式(在您自己的代码中调用y)以最适合较小尺寸的原始图像的方式.

Notice that, when calling autoencoder.fit(), x == y in the arguments. This is how the auto-encoder would (normally) have to optimize the bottleneck representation (that you call y in your own code) to best fit the original image with less dimensions.

但是,作为此答案第二部分的过渡,请注意您的情况x_dim < y_dim.实际上,您正在训练一个模型来增加数据维度,这没有多大意义,AFAICT.

But, as a transition to the second part of this answer, notice that in your case, x_dim < y_dim. You are actually training a model to increase the data dimensionality, which doesn't make much sense, AFAICT.

现在再次阅读您的问题,我认为自动编码器对您想要实现的目标没有任何好处.它们旨在减少数据的维度,并减少人员伤亡.

Now reading your question again, I don't think autoencoders are any good for what you want to achieve. They are designed to reduce the dimensionality of the data, with a minimum of casualties.

您想做的是:

  1. 渲染文本到图像(您称为encode)
  2. 阅读图像中的文字(您称为decode)
  1. Render a text to an image (what you call encode)
  2. Read a text from an image (what you call decode)

据我了解,虽然2.确实确实需要一些机器学习,但1.绝对不需要:有很多库可以在此处的图像上写文本.

In my understanding, while 2. might indeed require some machine learning, 1. definitely doesn't: there are plenty of libraries to write text on images out there.

这篇关于Keras中的深度自动编码器将一个维度转换为另一个维度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆