当我在tensorflow.keras中使用BinaryCrossentropy(from_logits = True)时,应将什么用作目标向量 [英] What should I use as target vector when I use BinaryCrossentropy(from_logits=True) in tensorflow.keras

查看:583
本文介绍了当我在tensorflow.keras中使用BinaryCrossentropy(from_logits = True)时,应将什么用作目标向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个多标签分类,其中每个目标都是一个1和0的向量,是互斥的(为清楚起见,我的目标类似于 [0,1,0、0、1、1,...] ).

I have a multi-label classification in which each target is a vector of ones and zeros not mutually exclusive (for the sake of clarity, my target is something like [0, 1, 0, 0, 1, 1, ... ]).

到目前为止,我的理解是:

My understanding so far is:

  • 我应该使用二进制交叉熵函数.(如 answer 中所述)

此外,我了解到 tf.keras.losses.BinaryCrossentropy()是tensorflow的 sigmoid_cross_entropy_with_logits 的包装.可以与 from_logits True False 一起使用.(如此问题中所述)

Also, I understood that tf.keras.losses.BinaryCrossentropy() is a wrapper around tensorflow's sigmoid_cross_entropy_with_logits. This can be used either with from_logits True or False. (as explained in this question)

由于 sigmoid_cross_entropy_with_logits 会执行自身的Sigmoid运算,因此它期望输入在[-inf,+ inf]范围内.

Since sigmoid_cross_entropy_with_logits performs itself the sigmoid, it expects the input to be in the [-inf,+inf] range.

tf.keras.losses.BinaryCrossentropy()本身是最后一层的S型激活,必须与 from_logits = False 一起使用.然后,它将推断出Sigmoid函数,并将输出传递给 sigmoid_cross_entropy_with_logits ,它将再次执行Sigmoid.但是,由于Sigmoid/logit函数的渐近性,这可能会导致数值问题.

tf.keras.losses.BinaryCrossentropy(), when the network implements itself a sigmoid activation of the last layer, must be used with from_logits=False. It will then infert the sigmoid function and pass the output to sigmoid_cross_entropy_with_logits that will do the sigmoid again. This however can cause numerical issues due to the asymptotes of the sigmoid/logit function.

为提高数值稳定性,我们可以避免最后一个S型层,并使用 tf.keras.losses.BinaryCrossentropy(from_logits = False)

To improve the numerical stability, we can avoid the last sigmoid layer and use tf.keras.losses.BinaryCrossentropy(from_logits=False)

问题:

如果我们使用 tf.keras.losses.BinaryCrossentropy(from_logits = False),我应该使用哪个目标?我是否需要更改单热点矢量的目标?

If we use tf.keras.losses.BinaryCrossentropy(from_logits=False), what target should I use? Do I need to change my target for the one-hot vector?

我想我应该在推断时对网络输出应用S型激活.有没有办法添加仅在推理模式下而不在训练模式下有效的S形层?

I suppose that I should apply then a sigmoid activation to the network output at inference time. Is there a way to add a sigmoid layer active only in inference mode and not in training mode?

推荐答案

首先,让我对数值稳定性进行一些说明:

First, let me give some notes about the numerical stability:

如评论部分所述,在使用 from_logits = False 的情况下,数值不稳定性来自于将概率值转换回logits的过程,这涉及裁剪操作(如其答案).但是,据我所知,这不会对大多数实际应用造成任何严重的问题(尽管在某些情况下,在内部应用softmax/Sigmoid函数损失函数(即使用 from_logits = True )在计算梯度方面将在数值上更稳定;请参见此答案进行数学解释.)

As mentioned in the comments section, the numerical instability in case of using from_logits=False comes from the transformation of probability values back into logits which involves a clipping operation (as discussed in this question and its answer). However, to the best of my knowledge, this does NOT create any serious issues for most of practical applications (although, there are some cases where applying the softmax/sigmoid function inside the loss function, i.e. using from_logits=True, would be more numerically stable in terms of computing gradients; see this answer for a mathematical explanation).

换句话说,如果您不关心灵敏度小于1e-7的概率值的生成精度,或者您在实验中观察到的相关收敛问题,那么您也不必担心很多;只需像以前一样使用S形和二进制交叉熵,即 model.compile(loss ='binary_crossentropy',...),就可以正常工作.

In other words, if you are not concerned with precision of generated probability values with sensitivity of less than 1e-7, or a related convergence issue observed in your experiments, then you should not worry too much; just use the sigmoid and binary cross-entropy as before, i.e. model.compile(loss='binary_crossentropy', ...), and it would work fine.

总而言之,如果您真正关心数值稳定性,则可以采用最安全的路径并使用 from_logits = True ,而无需在模型的最后一层使用任何激活函数.

All in all, if you are really concerned with numerical stability, you can take the safest path and use from_logits=True without using any activation function on the last layer of the model.

现在,要回答最初的问题,在使用 BinaryCrossentropy(from_logits = True)时,真实标签或目标值(即 y_true )仍应仅为零或1..而是 y_pred (即模型的输出),在这种情况下不应该是概率分布(即,如果 from_logits = True,则不应在最后一层使用sigmoid函数)).

Now, to answer the original question, the true labels or target values (i.e. y_true) should be still only zeros or ones when using BinaryCrossentropy(from_logits=True). Rather, that's the y_pred (i.e. the output of the model) which should not be a probability distribution in this case (i.e. the sigmoid function should not be used on the last layer if from_logits=True).

这篇关于当我在tensorflow.keras中使用BinaryCrossentropy(from_logits = True)时,应将什么用作目标向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆