如何在 TensorFlow 中选择交叉熵损失? [英] How to choose cross-entropy loss in TensorFlow?

查看:26
本文介绍了如何在 TensorFlow 中选择交叉熵损失?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

分类问题,例如逻辑回归或多项式逻辑回归,优化交叉熵损失.通常,交叉熵层跟在softmax层之后,产生概率分布.

Classification problems, such as logistic regression or multinomial logistic regression, optimize a cross-entropy loss. Normally, the cross-entropy layer follows the softmax layer, which produces probability distribution.

在tensorflow中,至少有十几种不同的交叉熵损失函数:

In tensorflow, there are at least a dozen of different cross-entropy loss functions:

  • tf.losses.softmax_cross_entropy
  • tf.losses.sparse_softmax_cross_entropy
  • tf.losses.sigmoid_cross_entropy
  • tf.contrib.losses.softmax_cross_entropy
  • tf.contrib.losses.sigmoid_cross_entropy
  • tf.nn.softmax_cross_entropy_with_logits
  • tf.nn.sigmoid_cross_entropy_with_logits
  • ...

哪一个只适用于二分类,哪一个适用于多类问题?什么时候应该使用 sigmoid 而不是 softmax?sparse 函数与其他函数有何不同,为什么只有softmax?

Which one works only for binary classification and which are suitable for multi-class problems? When should you use sigmoid instead of softmax? How are sparse functions different from others and why is it only softmax?

相关(更面向数学)讨论:Keras 和 TensorFlow 中的所有这些交叉熵损失之间有什么区别?.

Related (more math-oriented) discussion: What are the differences between all these cross-entropy losses in Keras and TensorFlow?.

推荐答案

初步事实

  • 在函数意义上,sigmoid 是 softmax 函数的部分情况,当数类的数量等于 2.它们都执行相同的操作:将 logits(见下文)转换为概率.

    Preliminary facts

    • In functional sense, the sigmoid is a partial case of the softmax function, when the number of classes equals 2. Both of them do the same operation: transform the logits (see below) to probabilities.

      在简单的二元分类中,两者没有太大区别,然而,在多项分类的情况下,sigmoid 允许处理具有非排他性标签(又名多标签),而 softmax 交易具有独家课程(见下文).

      In simple binary classification, there's no big difference between the two, however in case of multinomial classification, sigmoid allows to deal with non-exclusive labels (a.k.a. multi-labels), while softmax deals with exclusive classes (see below).

      logit(也称为分数)是一个 与一个类,在计算概率之前.就神经网络架构而言,这意味着 logit 是密集(全连接)层的输出.

      A logit (also called a score) is a raw unscaled value associated with a class, before computing the probability. In terms of neural network architecture, this means that a logit is an output of a dense (fully-connected) layer.

      Tensorflow 命名有点奇怪:下面的所有函数都接受对数,而不是概率,并自行应用转换(这只是更有效).

      Tensorflow naming is a bit strange: all of the functions below accept logits, not probabilities, and apply the transformation themselves (which is simply more efficient).

      如前所述,sigmoid 损失函数用于二元分类.但张量流函数更通用,允许做多标签分类,当类是独立的.换句话说,tf.nn.sigmoid_cross_entropy_with_logits 解决了 N一次二元分类.

      As stated earlier, sigmoid loss function is for binary classification. But tensorflow functions are more general and allow to do multi-label classification, when the classes are independent. In other words, tf.nn.sigmoid_cross_entropy_with_logits solves N binary classifications at once.

      标签必须是单热编码或可以包含软类概率.

      The labels must be one-hot encoded or can contain soft class probabilities.

      tf.losses.sigmoid_cross_entropy 还允许设置批内权重,即让一些例子比其他例子更重要.tf.nn.weighted_cross_entropy_with_logits 允许设置类权重(记住,分类是二元的),即使正误差大于负错误.这在训练数据不平衡时很有用.

      tf.losses.sigmoid_cross_entropy in addition allows to set the in-batch weights, i.e. make some examples more important than others. tf.nn.weighted_cross_entropy_with_logits allows to set class weights (remember, the classification is binary), i.e. make positive errors larger than negative errors. This is useful when the training data is unbalanced.

      这些损失函数应该用于多项互斥分类,即从 N 个类中选择一个.也适用于 N = 2.

      These loss functions should be used for multinomial mutually exclusive classification, i.e. pick one out of N classes. Also applicable when N = 2.

      标签必须是单热编码或可以包含软类概率:一个特定的例子可以以 50% 的概率属于 A 类和 B 类有 50% 的概率.请注意,严格来说这并不意味着它属于两个类,但可以这样解释概率.

      The labels must be one-hot encoded or can contain soft class probabilities: a particular example can belong to class A with 50% probability and class B with 50% probability. Note that strictly speaking it doesn't mean that it belongs to both classes, but one can interpret the probabilities this way.

      就像在 sigmoid 家族中一样,tf.losses.softmax_cross_entropy 允许设置批内权重,即让一些示例比其他示例更重要.据我所知,从 tensorflow 1.3 开始,没有设置类权重的内置方法.

      Just like in sigmoid family, tf.losses.softmax_cross_entropy allows to set the in-batch weights, i.e. make some examples more important than others. As far as I know, as of tensorflow 1.3, there's no built-in way to set class weights.

      [UPD] 在 tensorflow 1.5,v2 版本 被引入 并且最初的softmax_cross_entropy_with_logits 损失被弃用了.它们之间的唯一区别是在较新的版本中,反向传播发生在 logits 和标签中(这里讨论为什么这可能有用).

      [UPD] In tensorflow 1.5, v2 version was introduced and the original softmax_cross_entropy_with_logits loss got deprecated. The only difference between them is that in a newer version, backpropagation happens into both logits and labels (here's a discussion why this may be useful).

      和上面普通的softmax一样,这些损失函数应该用于多项互斥分类,即从 N 个类别中选择一个.不同之处在于标签编码:类被指定为整数(类索引),不是单热向量.显然,这不允许软类,但它当有数千或数百万个类时可以节省一些内存.但是,请注意 logits 参数仍然必须包含每个类的 logits,因此它至少消耗 [batch_size, classes] 内存.

      Like ordinary softmax above, these loss functions should be used for multinomial mutually exclusive classification, i.e. pick one out of N classes. The difference is in labels encoding: the classes are specified as integers (class index), not one-hot vectors. Obviously, this doesn't allow soft classes, but it can save some memory when there are thousands or millions of classes. However, note that logits argument must still contain logits per each class, thus it consumes at least [batch_size, classes] memory.

      像上面一样,tf.losses 版本有一个 weights 参数,它允许设置批量权重.

      Like above, tf.losses version has a weights argument which allows to set the in-batch weights.

      这些函数为处理大量类提供了另一种选择.他们不是计算和比较精确的概率分布,而是计算来自随机样本的损失估计.

      These functions provide another alternative for dealing with huge number of classes. Instead of computing and comparing an exact probability distribution, they compute a loss estimate from a random sample.

      参数 weightsbiases 指定一个单独的全连接层用于计算所选样本的 logits.

      The arguments weights and biases specify a separate fully-connected layer that is used to compute the logits for a chosen sample.

      像上面一样,labels 不是单热编码,而是具有 [batch_size, num_true] 的形状.

      Like above, labels are not one-hot encoded, but have the shape [batch_size, num_true].

      采样函数只适合训练.在测试时,建议使用标准的 softmax 损失(稀疏或单热)来获得实际分布.

      Sampled functions are only suitable for training. In test time, it's recommended to use a standard softmax loss (either sparse or one-hot) to get an actual distribution.

      另一个替代损失是tf.nn.nce_loss,它执行噪声对比估计(如果您有兴趣,请参阅此非常详细的讨论).我已经将这个函数包含在 softmax 系列中,因为 NCE 保证在极限内逼近 softmax.

      Another alternative loss is tf.nn.nce_loss, which performs noise-contrastive estimation (if you're interested, see this very detailed discussion). I've included this function to the softmax family, because NCE guarantees approximation to softmax in the limit.

      这篇关于如何在 TensorFlow 中选择交叉熵损失?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆