乙状结肠为什么Keras /张量流的交叉熵精度低吗? [英] Why does sigmoid & crossentropy of Keras/tensorflow have low precision?
问题描述
我有以下简单的神经网络(仅具有1个神经元)来测试 Sigmoid
激活和计算的精度。 binary_crossentropy
of Keras:
I have the following simple neural network (with 1 neuron only) to test the computation precision of sigmoid
activation & binary_crossentropy
of Keras:
model = Sequential()
model.add(Dense(1, input_dim=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
为了简化测试,我手动将唯一权重设置为1,将偏倚设置为0,然后使用2点训练集 {(-a,0),(a,1)}
,即
To simplify the test, I manually set the only weight to 1 and bias to 0, and then evaluate the model with 2-point training set {(-a, 0), (a, 1)}
, i.e.
y = numpy.array([0, 1])
for a in range(40):
x = numpy.array([-a, a])
keras_ce[a] = model.evaluate(x, y)[0] # cross-entropy computed by keras/tensorflow
my_ce[a] = np.log(1+exp(-a)) # My own computation
我的问题:我发现了二进制交叉熵(<$ c当 a $ c时,由Keras / Tensorflow计算出的$ c> keras_ce
达到下限 1.09e-7
$ c>是大约。 16,如下图所示(蓝线)。随着 a的增长,它不会进一步减少。这是为什么?
My Question: I found the binary crossentropy (keras_ce
) computed by Keras/Tensorflow reach a floor of 1.09e-7
when a
is approx. 16, as illustrated below (blue line). It doesn't decrease further as 'a' keeps growing. Why is that?
此神经网络只有1个神经元,其权重设置为1,偏差为0。使用2点训练集 {(-a,0),(a,1)}
, binary_crossentropy
就是
This neural network has 1 neuron only whose weight is set to 1 and bias is 0. With the 2-point training set {(-a, 0), (a, 1)}
, the binary_crossentropy
is just
-1/2 [log(1-1 /(1 + exp(a)))+ log(1 /(1 + exp(-a)))] = log(1 + exp(-a))
-1/2 [ log(1 - 1/(1+exp(a)) ) + log( 1/(1+exp(-a)) ) ] = log(1+exp(-a))
因此,交叉熵应减小为 a
增加,如上面的橙色( my)所示。是否可以更改某些Keras / Tensorflow / Python设置以提高其精度?还是我在某个地方弄错了?我将不胜感激任何建议/意见/答案。
So the cross-entropy should decrease as a
increases, as illustrated in orange ('my') above. Is there some Keras/Tensorflow/Python setup I can change to increase its precision? Or am I mistaken somewhere? I'd appreciate any suggestions/comments/answers.
推荐答案
TL; DR版本:计算损失函数时,由于数值稳定性,会剪切值(即S型函数的输出)。
TL;DR version: the probability values (i.e. the outputs of sigmoid function) are clipped due to numerical stability when computing the loss function.
在源代码中,您会发现使用 binary_crossentropy
作为损失会导致调用中的 binary_crossentropy
函数href = https://github.com/keras-team/keras/blob/5a7a789ee9766b6a594bd4be8b9edb34e71d6500/keras/losses.py#L76 rel = noreferrer> losses.py 文件:
If you inspect the source code, you would find that using binary_crossentropy
as the loss would result in a call to binary_crossentropy
function in losses.py file:
def binary_crossentropy(y_true, y_pred):
return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)
如您所见,它依次调用等效的后端函数。如果使用Tensorflow作为后端,则将导致调用
which in turn, as you can see, calls the equivalent backend function. In case of using Tensorflow as the backend, that would result in a call to binary_crossentropy
function in tensorflow_backend.py file:
def binary_crossentropy(target, output, from_logits=False):
""" Docstring ..."""
# Note: tf.nn.sigmoid_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
# transform back to logits
_epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.log(output / (1 - output))
return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
logits=output)
您可以看到 from_logits
参数设置为<默认情况下code> False 。因此,if条件的计算结果为true,结果输出中的值将被裁剪为 [epsilon,1-epislon]
。这就是为什么无论概率大小如何,都不能小于 epsilon
和大于 1-epsilon
。这就解释了为什么 binary_crossentropy
损失的输出也受到限制的原因。
As you can see from_logits
argument is set to False
by default. Therefore, the if condition evaluates to true and as a result the values in the output are clipped to the range [epsilon, 1-epislon]
. That's why no matter how small or large a probability is, it could not be smaller than epsilon
and greater than 1-epsilon
. And that explains why the output of binary_crossentropy
loss is also bounded.
现在,这里的epsilon是什么?这是一个非常小的常数,可用于数值稳定性(例如,防止被零除或未定义的行为除以此类推)。要了解其价值,您可以进一步检查源代码,您可以在 common.py 文件:
Now, what is this epsilon here? It is a very small constant which is used for numerical stability (e.g. prevent division by zero or undefined behaviors, etc.). To find out its value you can further inspect the source code and you would find it in the common.py file:
_EPSILON = 1e-7
def epsilon():
"""Returns the value of the fuzz factor used in numeric expressions.
# Returns
A float.
# Example
```python
>>> keras.backend.epsilon()
1e-07
```
"""
return _EPSILON
如果出于任何原因想要更高的精度,则可以使用以下方法将epsilon值设置为较小的常数后端的 set_epsilon
函数:
If for any reason, you would like more precision you can alternatively set the epsilon value to a smaller constant using set_epsilon
function from the backend:
def set_epsilon(e):
"""Sets the value of the fuzz factor used in numeric expressions.
# Arguments
e: float. New value of epsilon.
# Example
```python
>>> from keras import backend as K
>>> K.epsilon()
1e-07
>>> K.set_epsilon(1e-05)
>>> K.epsilon()
1e-05
```
"""
global _EPSILON
_EPSILON = e
但是,请注意,将epsilon设置为极低的正值或零可能会破坏整个Keras的计算稳定性。
However, be aware that setting epsilon to an extremely low positive value or zero, may disrupt the stability of computations all over the Keras.
这篇关于乙状结肠为什么Keras /张量流的交叉熵精度低吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!