什么是交叉熵? [英] What is cross-entropy?

查看:164
本文介绍了什么是交叉熵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道关于交叉熵有很多解释,但是我仍然很困惑.

I know that there are a lot of explanations of what cross-entropy is, but I'm still confused.

这仅仅是描述损失函数的一种方法吗?我们可以使用梯度下降算法通过损失函数找到最小值吗?

Is it only a method to describe the loss function? Can we use gradient descent algorithm to find the minimum using the loss function?

推荐答案

交叉熵通常用于量化两个概率分布之间的差异.通常,真实"分布(您的机器学习算法正在尝试匹配的分布)是用一站式"分布表示的.

Cross-entropy is commonly used to quantify the difference between two probability distributions. Usually the "true" distribution (the one that your machine learning algorithm is trying to match) is expressed in terms of a one-hot distribution.

例如,假设对于特定的训练实例,标签为B(可能的标签A,B和C中的标签).因此,此培训实例的一个热点分布是:

For example, suppose for a specific training instance, the label is B (out of the possible labels A, B, and C). The one-hot distribution for this training instance is therefore:

Pr(Class A)  Pr(Class B)  Pr(Class C)
        0.0          1.0          0.0

您可以将上述真实"分布解释为该训练实例具有0%的概率是A类,100%的概率是B类和0%的概率是C类.

You can interpret the above "true" distribution to mean that the training instance has 0% probability of being class A, 100% probability of being class B, and 0% probability of being class C.

现在,假设您的机器学习算法可以预测以下概率分布:

Now, suppose your machine learning algorithm predicts the following probability distribution:

Pr(Class A)  Pr(Class B)  Pr(Class C)
      0.228        0.619        0.153

预测分布与真实分布有多接近?这就是交叉熵损失所决定的.使用以下公式:

How close is the predicted distribution to the true distribution? That is what the cross-entropy loss determines. Use this formula:

其中p(x)是所需概率,而q(x)是实际概率.总和超过三个类别A,B和C.在这种情况下,损失为 0.479 :

Where p(x) is the wanted probability, and q(x) the actual probability. The sum is over the three classes A, B, and C. In this case the loss is 0.479 :

H = - (0.0*ln(0.228) + 1.0*ln(0.619) + 0.0*ln(0.153)) = 0.479

这就是您的预测与真实分布有多么错误"或遥远".

So that is how "wrong" or "far away" your prediction is from the true distribution.

交叉熵是许多可能的损失函数之一(另一个流行的函数是SVM铰链损失).这些损失函数通常写为J(θ),并且可以在梯度下降中使用,梯度下降是一种将参数(或系数)移向最佳值的迭代算法.在下面的公式中,您可以将J(theta)替换为H(p, q).但是请注意,您需要首先针对参数计算H(p, q)的导数.

Cross entropy is one out of many possible loss functions (another popular one is SVM hinge loss). These loss functions are typically written as J(theta) and can be used within gradient descent, which is an iterative algorithm to move the parameters (or coefficients) towards the optimum values. In the equation below, you would replace J(theta) with H(p, q). But note that you need to compute the derivative of H(p, q) with respect to the parameters first.

因此,直接回答您的原始问题:

So to answer your original questions directly:

这仅仅是描述损失函数的一种方法吗?

Is it only a method to describe the loss function?

正确的交叉熵描述了两个概率分布之间的损失.它是许多可能的损失函数之一.

Correct, cross-entropy describes the loss between two probability distributions. It is one of many possible loss functions.

然后我们可以使用例如梯度下降算法来找到 最低.

Then we can use, for example, gradient descent algorithm to find the minimum.

是的,交叉熵损失函数可以用作梯度下降的一部分.

Yes, the cross-entropy loss function can be used as part of gradient descent.

进一步阅读:我的与TensorFlow相关的其他答案之一.

Further reading: one of my other answers related to TensorFlow.

这篇关于什么是交叉熵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆