乙状结肠为什么Keras /张量流的交叉熵精度低吗? [英] Why does sigmoid & crossentropy of Keras/tensorflow have low precision?

查看:87
本文介绍了乙状结肠为什么Keras /张量流的交叉熵精度低吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下简单的神经网络(仅具有1个神经元)来测试 Sigmoid 激活和计算的精度。 binary_crossentropy of Keras:

I have the following simple neural network (with 1 neuron only) to test the computation precision of sigmoid activation & binary_crossentropy of Keras:

model = Sequential()
model.add(Dense(1, input_dim=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

为了简化测试,我手动将唯一权重设置为1,将偏倚设置为0,然后使用2点训练集 {(-a,0),(a,1)} ,即

To simplify the test, I manually set the only weight to 1 and bias to 0, and then evaluate the model with 2-point training set {(-a, 0), (a, 1)}, i.e.

y = numpy.array([0, 1])
for a in range(40):
    x = numpy.array([-a, a])
    keras_ce[a] = model.evaluate(x, y)[0] # cross-entropy computed by keras/tensorflow
    my_ce[a] = np.log(1+exp(-a)) # My own computation

我的问题:我发现了二进制交叉熵(<$ c当 a keras_ce 达到下限 1.09e-7 $ c>是大约。 16,如下图所示(蓝线)。随着 a的增长,它不会进一步减少。这是为什么?

My Question: I found the binary crossentropy (keras_ce) computed by Keras/Tensorflow reach a floor of 1.09e-7 when a is approx. 16, as illustrated below (blue line). It doesn't decrease further as 'a' keeps growing. Why is that?

此神经网络只有1个神经元,其权重设置为1,偏差为0。使用2点训练集 {(-a,0),(a,1)} binary_crossentropy 就是

This neural network has 1 neuron only whose weight is set to 1 and bias is 0. With the 2-point training set {(-a, 0), (a, 1)}, the binary_crossentropy is just

-1/2 [log(1-1 /(1 + exp(a)))+ log(1 /(1 + exp(-a)))] = log(1 + exp(-a))

-1/2 [ log(1 - 1/(1+exp(a)) ) + log( 1/(1+exp(-a)) ) ] = log(1+exp(-a))

因此,交叉熵应减小为 a 增加,如上面的橙色( my)所示。是否可以更改某些Keras / Tensorflow / Python设置以提高其精度?还是我在某个地方弄错了?我将不胜感激任何建议/意见/答案。

So the cross-entropy should decrease as a increases, as illustrated in orange ('my') above. Is there some Keras/Tensorflow/Python setup I can change to increase its precision? Or am I mistaken somewhere? I'd appreciate any suggestions/comments/answers.

推荐答案

TL; DR版本:计算损失函数时,由于数值稳定性,会剪切值(即S型函数的输出)。

TL;DR version: the probability values (i.e. the outputs of sigmoid function) are clipped due to numerical stability when computing the loss function.

在源代码中,您会发现使用 binary_crossentropy 作为损失会导致调用中的 binary_crossentropy 函数href = https://github.com/keras-team/keras/blob/5a7a789ee9766b6a594bd4be8b9edb34e71d6500/keras/losses.py#L76 rel = noreferrer> losses.py 文件:

If you inspect the source code, you would find that using binary_crossentropy as the loss would result in a call to binary_crossentropy function in losses.py file:

def binary_crossentropy(y_true, y_pred):
    return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

如您所见,它依次调用等效的后端函数。如果使用Tensorflow作为后端,则将导致调用binary_crossentropy 函数/blob/5a7a789ee9766b6a594bd4be8b9edb34e71d6500/keras/backend/tensorflow_backend.py#L3275 rel = noreferrer> tensorflow_backend.py 文件:

which in turn, as you can see, calls the equivalent backend function. In case of using Tensorflow as the backend, that would result in a call to binary_crossentropy function in tensorflow_backend.py file:

def binary_crossentropy(target, output, from_logits=False):
    """ Docstring ..."""

    # Note: tf.nn.sigmoid_cross_entropy_with_logits
    # expects logits, Keras expects probabilities.
    if not from_logits:
        # transform back to logits
        _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
        output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
        output = tf.log(output / (1 - output))

    return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
                                                   logits=output)

您可以看到 from_logits 参数设置为<默认情况下code> False 。因此,if条件的计算结果为true,结果输出中的值将被裁剪为 [epsilon,1-epislon] 。这就是为什么无论概率大小如何,都不能小于 epsilon 和大于 1-epsilon 。这就解释了为什么 binary_crossentropy 损失的输出也受到限制的原因。

As you can see from_logits argument is set to False by default. Therefore, the if condition evaluates to true and as a result the values in the output are clipped to the range [epsilon, 1-epislon]. That's why no matter how small or large a probability is, it could not be smaller than epsilon and greater than 1-epsilon. And that explains why the output of binary_crossentropy loss is also bounded.

现在,这里的epsilon是什么?这是一个非常小的常数,可用于数值稳定性(例如,防止被零除或未定义的行为除以此类推)。要了解其价值,您可以进一步检查源代码,您可以在 common.py 文件:

Now, what is this epsilon here? It is a very small constant which is used for numerical stability (e.g. prevent division by zero or undefined behaviors, etc.). To find out its value you can further inspect the source code and you would find it in the common.py file:

_EPSILON = 1e-7

def epsilon():
    """Returns the value of the fuzz factor used in numeric expressions.
    # Returns
        A float.
    # Example
    ```python
        >>> keras.backend.epsilon()
        1e-07
    ```
    """
    return _EPSILON

如果出于任何原因想要更高的精度,则可以使用以下方法将epsilon值设置为较小的常数后端的 set_epsilon 函数:

If for any reason, you would like more precision you can alternatively set the epsilon value to a smaller constant using set_epsilon function from the backend:

def set_epsilon(e):
    """Sets the value of the fuzz factor used in numeric expressions.
    # Arguments
        e: float. New value of epsilon.
    # Example
    ```python
        >>> from keras import backend as K
        >>> K.epsilon()
        1e-07
        >>> K.set_epsilon(1e-05)
        >>> K.epsilon()
        1e-05
    ```
    """
    global _EPSILON
    _EPSILON = e

但是,请注意,将epsilon设置为极低的正值或零可能会破坏整个Keras的计算稳定性。

However, be aware that setting epsilon to an extremely low positive value or zero, may disrupt the stability of computations all over the Keras.

这篇关于乙状结肠为什么Keras /张量流的交叉熵精度低吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆