使用交叉熵时如何处理log(0) [英] How to handle log(0) when using cross entropy

查看:997
本文介绍了使用交叉熵时如何处理log(0)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了使情况简单直观,我将使用二进制(0和1)分类进行说明.

In order to make the case simple and intuitive, I will using binary (0 and 1) classification for illustration.

丢失功能

loss = np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY)) #cross entropy
cost = -np.sum(loss)/m #num of examples in batch is m

Y的概率

predY是使用S型曲线计算的,并且logits可以认为是到达分类步骤之前来自神经网络的结果

predY is computed using sigmoid and logits can be thought as the outcome of from a neural network before reaching the classification step

predY = sigmoid(logits) #binary case

def sigmoid(X):
    return 1/(1 + np.exp(-X))

问题

假设我们正在运行前馈网络.

Suppose we are running a feed-forward net.

输入:[3,5]:3是示例数,5是特征尺寸(伪造数据)

Inputs: [3, 5]: 3 is number of examples and 5 is feature size (fabricated data)

隐藏单元数:100(仅1个隐藏层)

Num of hidden units: 100 (only 1 hidden layer)

迭代次数:10000

Iterations: 10000

这种安排被设置为过拟合.当它过度拟合时,我们可以完美地预测出训练样本的概率;换句话说,S型会输出1或0的精确数字,因为指数会爆炸.如果是这种情况,我们将 np.log(0)未定义.您通常如何处理此问题?

Such arrangement is set to overfit. When it's overfitting, we can perfectly predict the probability for the training examples; in other words, sigmoid outputs either 1 or 0, exact number because the exponential gets exploded. If this is the case, we would have np.log(0) undefined. How do you usually handle this issue?

推荐答案

如果您不介意对scipy的依赖,则可以使用

If you don't mind the dependency on scipy, you can use scipy.special.xlogy. You would replace the expression

np.multiply(np.log(predY), Y) + np.multiply((1 - Y), np.log(1 - predY))

使用

xlogy(Y, predY) + xlogy(1 - Y, 1 - predY)

如果您希望predY包含非常小的值,则使用

If you expect predY to contain very small values, you might get better numerical results using scipy.special.xlog1py in the second term:

xlogy(Y, predY) + xlog1py(1 - Y, -predY)

或者,知道Y中的值为0或1,您可以采用完全不同的方式来计算成本:

Alternatively, knowing that the values in Y are either 0 or 1, you can compute the cost in an entirely different way:

Yis1 = Y == 1
cost = -(np.log(predY[Yis1]).sum() + np.log(1 - predY[~Yis1]).sum())/m

这篇关于使用交叉熵时如何处理log(0)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆