Keras 中的深度自动编码器将一个维度转换为另一个维度 i [英] Deep autoencoder in Keras converting one dimension to another i

查看:41
本文介绍了Keras 中的深度自动编码器将一个维度转换为另一个维度 i的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个使用向量来表示图像和标题的图像标题任务.

I am doing an image captioning task using vectors for representing both images and captions.

标题向量的长度/尺寸为 128.图像向量的长度/维度为 2048.

The caption vectors have a legth/dimension of size 128. The image vectors have a length/dimension of size 2048.

我想做的是训练一个自动编码器,以获得一个能够将文本向量转换为图像向量的编码器.以及能够将图像向量转换为文本向量的解码器.

What I want to do is to train an autoencoder, to get an encoder which is able to convert text vector into a image vector. And a decoder which is able to convert an image vector into a text vector.

编码器:128 -> 2048.

Encoder: 128 -> 2048.

解码器:2048 -> 128.

Decoder: 2048 -> 128.

我按照教程来实现一个浅层网络想要.

I followed this tutorial to implement a shallow network doing what I wanted.

但是我不知道如何按照相同的教程创建深度网络.

But I cant figure out how to create a deep network, following the same tutorial.

x_dim = 128
y_dim = 2048
x_dim_shape = Input(shape=(x_dim,))
encoded = Dense(512, activation='relu')(x_dim_shape)
encoded = Dense(1024, activation='relu')(encoded)
encoded = Dense(y_dim, activation='relu')(encoded)

decoded = Dense(1024, activation='relu')(encoded)
decoded = Dense(512, activation='relu')(decoded)
decoded = Dense(x_dim, activation='sigmoid')(decoded)

# this model maps an input to its reconstruction
autoencoder = Model(input=x_dim_shape, output=decoded)

# this model maps an input to its encoded representation
encoder = Model(input=x_dim_shape, output=encoded)

encoded_input = Input(shape=(y_dim,))
decoder_layer1 = autoencoder.layers[-3]
decoder_layer2 = autoencoder.layers[-2]
decoder_layer3 = autoencoder.layers[-1]

# create the decoder model
decoder = Model(input=encoded_input, output=decoder_layer3(decoder_layer2(decoder_layer1(encoded_input))))

autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')


autoencoder.fit(training_data_x, training_data_y,
                nb_epoch=50,
                batch_size=256,
                shuffle=True,
                validation_data=(test_data_x, test_data_y))

training_data_x 和 test_data_x 有 128 个维度.training_data_y 和 test_data_y 有 2048 个维度.

The training_data_x and test_data_x have 128 dimensions. The training_data_y and test_data_y have 2048 dimensions.

我在尝试运行时收到的错误如下:

The error I receive while trying to run this is the following:

异常:检查模型目标时出错:预期dense_6 具有形状(无,128)但得到形状为(32360,2048)的数组

Exception: Error when checking model target: expected dense_6 to have shape (None, 128) but got array with shape (32360, 2048)

dense_6 是最后一个解码的变量.

dense_6 is the last decoded variable.

推荐答案

Autoencoders

如果您希望能够分别调用 encoderdecoder,您需要做的是完全按照教程训练整个自动编码器,使用 <代码>input_shape == output_shape(== 128 在你的情况下),只有这样你才能调用层的子集:

Autoencoders

If you want is to be able to call the encoder and decoder separately, what you need to do is train the whole autoencoder exactly as per the tutorial, with input_shape == output_shape (== 128 in your case), and only then can you call a subset of the layers:

x_dim = 128
y_dim = 2048
x_dim_shape = Input(shape=(x_dim,))
encoded = Dense(512, activation='relu')(x_dim_shape)
encoded = Dense(1024, activation='relu')(encoded)
encoded = Dense(y_dim, activation='relu')(encoded)

decoded = Dense(1024, activation='relu')(encoded)
decoded = Dense(512, activation='relu')(decoded)
decoded = Dense(x_dim, activation='sigmoid')(decoded)

# this model maps an input to its reconstruction
autoencoder = Model(input=x_dim_shape, output=decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
autoencoder.fit(training_data_x, training_data_x, nb_epoch=50, batch_size=256, shuffle=True, validation_data=(test_data_x, test_data_y))

# test the decoder model
encoded_input = Input(shape=(y_dim,))
decoder_layer1 = autoencoder.layers[-3]
decoder_layer2 = autoencoder.layers[-2]
decoder_layer3 = autoencoder.layers[-1]

decoder = Model(input=encoded_input, output=decoder_layer3(decoder_layer2(decoder_layer1(encoded_input))))
decoder.compile(optimizer='adadelta', loss='binary_crossentropy')
eval = decoder.evaluate(test_data_y, test_data_x)
print('Decoder evaluation: {:.2f}'.format(eval))

请注意,在调用 autoencoder.fit() 时,参数中的 x == y.这就是自动编码器(通常)必须优化瓶颈表示(您在自己的代码中称之为 y)以最好地拟合具有较少尺寸的原始图像的方式.

Notice that, when calling autoencoder.fit(), x == y in the arguments. This is how the auto-encoder would (normally) have to optimize the bottleneck representation (that you call y in your own code) to best fit the original image with less dimensions.

但是,作为到本答案第二部分的过渡,请注意在您的情况下,x_dim <;y_dim.你实际上是在训练一个模型来增加数据维度,这没有多大意义,AFAICT.

But, as a transition to the second part of this answer, notice that in your case, x_dim < y_dim. You are actually training a model to increase the data dimensionality, which doesn't make much sense, AFAICT.

现在再次阅读您的问题,我认为自动编码器对您想要实现的目标没有任何好处.它们旨在减少数据的维度,同时将人员伤亡降至最低.

Now reading your question again, I don't think autoencoders are any good for what you want to achieve. They are designed to reduce the dimensionality of the data, with a minimum of casualties.

您正在尝试做的是:

  1. 渲染文本到图像(你称之为encode)
  2. 读取图像中的文本(您称之为decode)
  1. Render a text to an image (what you call encode)
  2. Read a text from an image (what you call decode)

据我所知,虽然 2. 可能确实需要一些机器学习,1. 绝对不需要:有很多库可以在图像上编写文本.

In my understanding, while 2. might indeed require some machine learning, 1. definitely doesn't: there are plenty of libraries to write text on images out there.

这篇关于Keras 中的深度自动编码器将一个维度转换为另一个维度 i的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆