Keras:binary_crossentropy &categorical_crossentropy 混淆 [英] Keras: binary_crossentropy & categorical_crossentropy confusion

查看:53
本文介绍了Keras:binary_crossentropy &categorical_crossentropy 混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用 TensorFlow 一段时间后,我阅读了一些 Keras 教程并实现了一些示例.我找到了几个使用 keras.losses.binary_crossentropy 作为损失函数的卷积自编码器教程.

After using TensorFlow for quite a while I have read some Keras tutorials and implemented some examples. I have found several tutorials for convolutional autoencoders that use keras.losses.binary_crossentropy as the loss function.

我认为 binary_crossentropy 应该是一个多类损失函数,很可能会使用二进制标签,但实际上 Keras(TF Python 后端)调用了 tf.nn.sigmoid_cross_entropy_with_logits,它实际上用于具有多个独立类的分类任务,互斥.

I thought binary_crossentropy should not be a multi-class loss function and would most likely use binary labels, but in fact Keras (TF Python backend) calls tf.nn.sigmoid_cross_entropy_with_logits, which actually is intended for classification tasks with multiple, independent classes that are not mutually exclusive.

另一方面,我对 categorical_crossentropy 的期望是针对多类分类,其中目标类相互依赖,但不一定是一个-热编码.

On the other hand, my expectation for categorical_crossentropy was to be intended for multi-class classifications where target classes have a dependency on each other, but are not necessarily one-hot encoded.

但是,Keras 文档指出:

However, the Keras documentation states:

(...) 当使用 categorical_crossentropy 损失时,你的目标应该是分类格式(例如,如果你有 10 个类,每个样本的目标应该是一个 10 维向量,全零期望在对应于样本类别的索引处为 1).

(...) when using the categorical_crossentropy loss, your targets should be in categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros expect for a 1 at the index corresponding to the class of the sample).

如果我没记错的话,这只是 one-hot 编码分类任务的特例,但潜在的交叉熵损失也适用于概率分布(多类",相关标签)?

If I am not mistaken, this is just the special case of one-hot encoded classification tasks, but the underlying cross-entropy loss also works with probability distributions ("multi-class", dependent labels)?

此外,Keras 使用 tf.nn.softmax_cross_entropy_with_logits(TF python 后端)来实现,它本身状态:

Additionally, Keras uses tf.nn.softmax_cross_entropy_with_logits (TF python backend) for the implementation, which itself states:

注意:虽然这些类是互斥的,但它们的概率不必.所需要的只是每一行标签都是一个有效的概率分布.如果不是,梯度的计算将是错误的.

NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row of labels is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.

如果我错了,请纠正我,但在我看来,Keras 文档——至少——不是很详细"?!

Please correct me if I am wrong, but it looks to me that the Keras documentation is - at least - not very "detailed"?!

那么,Keras 命名损失函数背后的想法是什么?文档是否正确?如果二元交叉熵真的依赖二元标签,它应该适用于自动编码器,对吧?!同样,分类交叉熵:如果文档正确,应该只适用于单热编码标签吗?!

So, what is the idea behind Keras' naming of the loss functions? Is the documentation correct? If the binary cross entropy would really rely on binary labels, it should not work for autoencoders, right?! Likewise the categorical crossentropy: should only work for one-hot encoded labels if the documentation is correct?!

推荐答案

定义每个损失适用的领域是正确的:

You are right by defining areas where each of these losses are applicable:

  • binary_crossentropy(和 tf.nn.sigmoid_cross_entropy_with_logits 引擎盖下)用于二元多标签分类(标签是独立的).
  • categorical_crossentropy(和 tf.nn.softmax_cross_entropy_with_logits 引擎盖下)用于多类分类(类是独占的).
  • binary_crossentropy (and tf.nn.sigmoid_cross_entropy_with_logits under the hood) is for binary multi-label classification (labels are independent).
  • categorical_crossentropy (and tf.nn.softmax_cross_entropy_with_logits under the hood) is for multi-class classification (classes are exclusive).

另见这个问题中的详细分析.

我不确定您的意思是什么教程,因此无法评论 binary_crossentropy 是自动编码器的好还是坏选择.

I'm not sure what tutorials you mean, so can't comment whether binary_crossentropy is a good or bad choice for autoencoders.

至于命名,绝对正确合理.或者你认为 sigmoidsoftmax 的名字听起来更好吗?

As for the naming, it is absolutely correct and reasonable. Or do you think sigmoid and softmax names sound better?

因此,您的问题中唯一令人困惑的是 categorical_crossentropy 文档.请注意,所陈述的一切都是正确的:损失支持单热表示.这个函数确实适用于标签的任何概率分布(除了 one-hot 向量)在 tensorflow 后端的情况下,它可以被包含在文档中,但这并不对我来说并不重要.此外,需要检查其他后端是否支持软类,theano 和 CNTK.请记住,keras 试图做到简约并针对大多数流行用例,所以我可以理解这里的逻辑.

So the only confusion left in your question is the categorical_crossentropy documentation. Note that everything that has been stated is correct: the loss supports one-hot representation. This function indeed works with any probability distribution for labels (in addition to one-hot vectors) in case of tensorflow backend and it could be included into the doc, but this doesn't look critical to me. Moreover, need to check if soft classes are supported in other backends, theano and CNTK. Remember that keras tries to be minimalistic and targets for most popular use cases, so I can understand the logic here.

这篇关于Keras:binary_crossentropy &categorical_crossentropy 混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆