二进制交叉熵损失如何在自动编码器上起作用? [英] How does binary cross entropy loss work on autoencoders?

查看:577
本文介绍了二进制交叉熵损失如何在自动编码器上起作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只使用Dense层编写了一个香草自动编码器. 下面是我的代码:

I wrote a vanilla autoencoder using only Dense layer. Below is my code:

iLayer = Input ((784,))
layer1 = Dense(128, activation='relu' ) (iLayer)
layer2 = Dense(64, activation='relu') (layer1)
layer3 = Dense(28, activation ='relu') (layer2)
layer4 = Dense(64, activation='relu') (layer3)
layer5 = Dense(128, activation='relu' ) (layer4)
layer6 = Dense(784, activation='softmax' ) (layer5)
model = Model (iLayer, layer6)
model.compile(loss='binary_crossentropy', optimizer='adam')

(trainX, trainY), (testX, testY) =  mnist.load_data()
print ("shape of the trainX", trainX.shape)
trainX = trainX.reshape(trainX.shape[0], trainX.shape[1]* trainX.shape[2])
print ("shape of the trainX", trainX.shape)
model.fit (trainX, trainX, epochs=5, batch_size=100)

问题:

1)softmax提供概率分布.明白了这意味着,我将得到一个784个值的向量,其概率在0到1之间.例如[0.02,0.03 .....最多784个项目],将所有784个元素相加得出1.

Questions:

1) softmax provides probability distribution. Understood. This means, I would have a vector of 784 values with probability between 0 and 1. For example [ 0.02, 0.03..... upto 784 items], summing all 784 elements provides 1.

2)我不了解二进制交叉熵如何与这些值一起工作.二进制交叉熵是针对两个输出值的,对吧?

2) I don't understand how the binary crossentropy works with these values. Binary cross entropy is for two values of output, right?

推荐答案

在自动编码器的上下文中,模型的输入和输出是相同的.因此,如果输入值在[0,1]范围内,则可以使用sigmoid作为最后一层的激活函数.否则,您需要在最后一层使用适当的激活功能(例如linear是默认层).

In the context of autoencoders the input and output of the model is the same. So, if the input values are in the range [0,1] then it is acceptable to use sigmoid as the activation function of last layer. Otherwise, you need to use an appropriate activation function for the last layer (e.g. linear which is the default one).

关于损失函数,它再次返回到输入数据的值.如果输入数据仅是零和一之间的 (而不是它们之间的值),那么binary_crossentropy作为损失函数是可以接受的.否则,您需要使用其他损失函数,例如'mse'(即均方误差)或'mae'(即均值绝对误差).请注意,对于输入值在[0,1]范围内的情况,可以使用通常使用的binary_crossentropy(例如本文).但是,不要指望损失值变为零,因为当预测和标签都不为零或一(无论它们是否相等)时,binary_crossentropy不会返回零. 此处 5:30 )

As for the loss function, it comes back to the values of input data again. If the input data are only between zeros and ones (and not the values between them), then binary_crossentropy is acceptable as the loss function. Otherwise, you need to use other loss functions such as 'mse' (i.e. mean squared error) or 'mae' (i.e. mean absolute error). Note that in the case of input values in range [0,1] you can use binary_crossentropy, as it is usually used (e.g. Keras autoencoder tutorial and this paper). However, don't expect that the loss value becomes zero since binary_crossentropy does not return zero when both prediction and label are not either zero or one (no matter they are equal or not). Here is a video from Hugo Larochelle where he explains the loss functions used in autoencoders (the part about using binary_crossentropy with inputs in range [0,1] starts at 5:30)

具体地,在您的示例中,您正在使用MNIST数据集.因此,默认情况下,MNIST的值是[0,255]范围内的整数.通常,您需要先对其进行标准化:

Concretely, in your example, you are using the MNIST dataset. So by default the values of MNIST are integers in the range [0, 255]. Usually you need to normalize them first:

trainX = trainX.astype('float32')
trainX /= 255.

现在值将在[0,1]范围内.因此sigmoid可以用作激活函数,而binary_crossentropymse可以用作损耗函数.

Now the values would be in range [0,1]. So sigmoid can be used as the activation function and either of binary_crossentropy or mse as the loss function.

为什么即使真正的标签值(即地面真相)在[0,1]范围内,也可以使用binary_crossentropy?

Why binary_crossentropy can be used even when the true label values (i.e. ground-truth) are in the range [0,1]?

请注意,我们正在尝试使训练中的损失函数最小化.因此,当预测等于真标签时,如果我们使用的损失函数达到最小值(不一定等于零),那么这是一个可以接受的选择.让我们验证一下binray交叉熵的情况,其定义如下:

Note that we are trying to minimize the loss function in training. So if the loss function we have used reaches its minimum value (which may not be necessarily equal to zero) when prediction is equal to true label, then it is an acceptable choice. Let's verify this is the case for binray cross-entropy which is defined as follows:

bce_loss = -y*log(p) - (1-y)*log(1-p)

其中,y是真实标签,而p是预测值.让我们考虑y是固定的,看看p的值是什么使该函数最小化:我们需要对p取导数(为了简化计算,我假设log是自然对数函数):

where y is the true label and p is the predicted value. Let's consider y as fixed and see what value of p minimizes this function: we need to take the derivative with respect to p (I have assumed the log is the natural logarithm function for simplicity of calculations):

bce_loss_derivative = -y*(1/p) - (1-y)*(-1/(1-p)) = 0 =>
                      -y/p + (1-y)/(1-p) = 0 =>
                      -y*(1-p) + (1-y)*p = 0 =>
                      -y + y*p + p - y*p = 0 =>
                       p - y = 0 => y = p

如您所见,当y=p时,即当真实标签等于预测标签时,二进制交叉熵具有最小值,这正是我们要寻找的.

As you can see binary cross-entropy have the minimum value when y=p, i.e. when the true label is equal to predicted label and this is exactly what we are looking for.

这篇关于二进制交叉熵损失如何在自动编码器上起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆