自编码器不学习恒等函数 [英] Autoencoder not learning identity function

查看:34
本文介绍了自编码器不学习恒等函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

总的来说,我对机器学习有点陌生,我想做一个简单的实验来更熟悉神经网络自动编码器:制作一个非常基本的自动编码器来学习恒等函数.

I'm somewhat new to machine learning in general, and I wanted to make a simple experiment to get more familiar with neural network autoencoders: To make an extremely basic autoencoder that would learn the identity function.

我使用 Keras 让生活更轻松,所以我首先这样做以确保它有效:

I'm using Keras to make life easier, so I did this first to make sure it works:

# Weights are given as [weights, biases], so we give
# the identity matrix for the weights and a vector of zeros for the biases
weights = [np.diag(np.ones(84)), np.zeros(84)]
model = Sequential([Dense(84, input_dim=84, weights=weights)])
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(X, X, nb_epoch=10, batch_size=8, validation_split=0.3)

正如预期的那样,训练和验证数据中的损失为零:

As expected, the loss is zero, both in train and validation data:

Epoch 1/10
97535/97535 [==============================] - 27s - loss: 0.0000e+00 - val_loss: 0.0000e+00
Epoch 2/10
97535/97535 [==============================] - 28s - loss: 0.0000e+00 - val_loss: 0.0000e+00

然后我尝试做同样的事情,但没有初始化恒等函数​​的权重,希望经过一段时间的训练后它会学习它.它没有.我让它在不同的配置下运行了 200 次不同的时间,使用不同的优化器、损失函数,并添加 L1 和 L2 活动正则化器.结果各不相同,但我得到的最好的仍然很糟糕,看起来与原始数据完全不同,只是在相同的数字范围内.数据只是一些在 1.1 附近振荡的数字.我不知道激活层是否对这个问题有意义,我应该使用一个吗?

Then I tried to do the same but without initializing the weights to the identity function, expecting that after a while of training it would learn it. It didn't. I've let it run for 200 epochs various times in different configurations, playing with different optimizers, loss functions, and adding L1 and L2 activity regularizers. The results vary, but the best I've got is still really bad, looking nothing like the original data, just being kinda in the same numeric range. The data is simply some numbers oscillating around 1.1. I don't know if an activation layer makes sense for this problem, should I be using one?

如果这个一层的神经网络"不能学习像恒等函数这样简单的东西,我怎么能指望它学习更复杂的东西?我做错了什么?

If this "neural network" of one layer can't learn something as simple as the identity function, how can I expect it to learn anything more complex? What am I doing wrong?

为了获得更好的上下文,这里有一种方法可以生成与我正在使用的数据集非常相似的数据集:

To have better context, here's a way to generate a dataset very similar to the one I'm using:

X = np.random.normal(1.1090579, 0.0012380764, (139336, 84))

我怀疑这些值之间的差异可能太小了.损失函数最终具有不错的值(大约 1e-6),但结果的精度不足以使结果与原始数据具有相似的形状.也许我应该以某种方式对其进行缩放/标准化?感谢您的建议!

I'm suspecting that the variations between the values might be too small. The loss function ends up having decent values (around 1e-6), but it's not enough precision for the result to have a similar shape to the original data. Maybe I should scale/normalize it somehow? Thanks for any advice!

最后,正如有人建议的那样,问题在于数据集在 84 个值之间的变化太小,因此结果预测实际上在绝对值(损失函数)方面非常好,但将其与原始数据进行比较,变化相去甚远.我通过将每个样本中围绕样本均值的 84 个值归一化并除以样本的标准差来解决它.然后我使用原始均值和标准差对另一端的预测进行非规范化.我想这可以通过几种不同的方式来完成,但我通过使用一些在张量上运行的 Lambda 层将这种规范化/非规范化添加到模型本身来做到这一点.这样所有的数据处理都被整合到模型中,这使得使用起来更好.如果您想查看实际代码,请告诉我.

In the end, as it was suggested, the issue was with the dataset having too small variations between the 84 values, so the resulting prediction was actually pretty good in absolute terms (loss function) but comparing it to the original data, the variations were far off. I solved it by normalizing the 84 values in each sample around the sample's mean and dividing by the sample's standard deviation. Then I used the original mean and standard deviation to denormalize the predictions at the other end. I guess this could be done in a few different ways, but I did it by adding this normalization/denormalization into the model itself by using some Lambda layers that operated on the tensors. That way all the data processing was incorporated into the model, which made it nicer to work with. Let me know if you would like to see the actual code.

推荐答案

我相信问题可能是纪元的数量或您初始化 X 的方式.我用我的 X 运行了你的代码 100 个时期并打印了 argmax() 和权重的最大值,它非常接近恒等函数.

I believe the problem could be either the number of epoch or the way you inizialize X. I ran your code with an X of mine for 100 epochs and printed the argmax() and max values of the weights, it gets really close to the identity function.

我正在添加我使用的代码片段

I'm adding the code snippet that I used

from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import random
import pandas as pd

X = np.array([[random.random() for r in xrange(84)] for i in xrange(1,100000)])
model = Sequential([Dense(84, input_dim=84)], name="layer1")
model.compile(optimizer='sgd', loss='mean_squared_error')
model.fit(X, X, nb_epoch=100, batch_size=80, validation_split=0.3)

l_weights = np.round(model.layers[0].get_weights()[0],3)

print l_weights.argmax(axis=0)
print l_weights.max(axis=0)

我得到:

Train on 69999 samples, validate on 30000 samples
Epoch 1/100
69999/69999 [==============================] - 1s - loss: 0.2092 - val_loss: 0.1564
Epoch 2/100
69999/69999 [==============================] - 1s - loss: 0.1536 - val_loss: 0.1510
Epoch 3/100
69999/69999 [==============================] - 1s - loss: 0.1484 - val_loss: 0.1459
.
.
.
Epoch 98/100
69999/69999 [==============================] - 1s - loss: 0.0055 - val_loss: 0.0054
Epoch 99/100
69999/69999 [==============================] - 1s - loss: 0.0053 - val_loss: 0.0053
Epoch 100/100
69999/69999 [==============================] - 1s - loss: 0.0051 - val_loss: 0.0051
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83]
[ 0.85000002  0.85100001  0.79799998  0.80500001  0.82700002  0.81900001
  0.792       0.829       0.81099999  0.80800003  0.84899998  0.829       0.852
  0.79500002  0.84100002  0.81099999  0.792       0.80800003  0.85399997
  0.82999998  0.85100001  0.84500003  0.847       0.79699999  0.81400001
  0.84100002  0.81        0.85100001  0.80599999  0.84500003  0.824
  0.81999999  0.82999998  0.79100001  0.81199998  0.829       0.85600001
  0.84100002  0.792       0.847       0.82499999  0.84500003  0.796
  0.82099998  0.81900001  0.84200001  0.83999997  0.815       0.79500002
  0.85100001  0.83700001  0.85000002  0.79900002  0.84100002  0.79699999
  0.838       0.847       0.84899998  0.83700001  0.80299997  0.85399997
  0.84500003  0.83399999  0.83200002  0.80900002  0.85500002  0.83899999
  0.79900002  0.83399999  0.81        0.79100001  0.81800002  0.82200003
  0.79100001  0.83700001  0.83600003  0.824       0.829       0.82800001
  0.83700001  0.85799998  0.81999999  0.84299999  0.83999997]

当我只使用 5 个数字作为输入并打印实际重量时,我得到了这个:

When I used only 5 numbers as an input and printed the actual weights I got this:

array([[ 1.,  0., -0.,  0.,  0.],
       [ 0.,  1.,  0., -0., -0.],
       [-0.,  0.,  1.,  0.,  0.],
       [ 0., -0.,  0.,  1., -0.],
       [ 0., -0.,  0., -0.,  1.]], dtype=float32)

这篇关于自编码器不学习恒等函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆