使用交叉熵时如何处理log(0) [英] How to handle log(0) when using cross entropy
问题描述
为了使情况简单直观,我将使用二进制(0和1)分类进行说明.
In order to make the case simple and intuitive, I will using binary (0 and 1) classification for illustration.
丢失功能
loss = np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY)) #cross entropy
cost = -np.sum(loss)/m #num of examples in batch is m
Y的概率
predY
是使用S型曲线计算的,并且logits
可以认为是到达分类步骤之前来自神经网络的结果
predY
is computed using sigmoid and logits
can be thought as the outcome of from a neural network before reaching the classification step
predY = sigmoid(logits) #binary case
def sigmoid(X):
return 1/(1 + np.exp(-X))
问题
假设我们正在运行前馈网络.
Suppose we are running a feed-forward net.
输入:[3,5]:3是示例数,5是特征尺寸(伪造数据)
Inputs: [3, 5]: 3 is number of examples and 5 is feature size (fabricated data)
隐藏单元数:100(仅1个隐藏层)
Num of hidden units: 100 (only 1 hidden layer)
迭代次数:10000
Iterations: 10000
这种安排被设置为过拟合.当它过度拟合时,我们可以完美地预测出训练样本的概率;换句话说,S型会输出1或0的精确数字,因为指数会爆炸.如果是这种情况,我们将 np.log(0)
未定义.您通常如何处理此问题?
Such arrangement is set to overfit. When it's overfitting, we can perfectly predict the probability for the training examples; in other words, sigmoid outputs either 1 or 0, exact number because the exponential gets exploded. If this is the case, we would have np.log(0)
undefined. How do you usually handle this issue?
推荐答案
If you don't mind the dependency on scipy, you can use scipy.special.xlogy
. You would replace the expression
np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY))
使用
xlogy(Y, predY) + xlogy(1 - Y, 1 - predY)
If you expect predY
to contain very small values, you might get better numerical results using scipy.special.xlog1py
in the second term:
xlogy(Y, predY) + xlog1py(1 - Y, -predY)
或者,知道Y
中的值为0或1,您可以采用完全不同的方式来计算成本:
Alternatively, knowing that the values in Y
are either 0 or 1, you can compute the cost in an entirely different way:
Yis1 = Y == 1
cost = -(np.log(predY[Yis1]).sum() + np.log(1 - predY[~Yis1]).sum())/m
这篇关于使用交叉熵时如何处理log(0)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!