如何手动计算分类交叉熵? [英] How to calculate Categorical Cross-Entropy by hand?

查看：116 发布时间：2020/9/7 19:16:02 python tensorflow artificial-intelligence

本文介绍了如何手动计算分类交叉熵?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我手动计算二进制交叉熵时，我采用S形来获得概率，然后使用交叉熵公式并求出结果的平均值:

When I calculate Binary Crossentropy by hand I apply sigmoid to get probabilities, then use Cross-Entropy formula and mean the result:

logits = tf.constant([-1, -1, 0, 1, 2.])
labels = tf.constant([0, 0, 1, 1, 1.])

probs = tf.nn.sigmoid(logits)
loss = labels * (-tf.math.log(probs)) + (1 - labels) * (-tf.math.log(1 - probs))
print(tf.reduce_mean(loss).numpy()) # 0.35197204

cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
loss = cross_entropy(labels, logits)
print(loss.numpy()) # 0.35197204

logits和labels具有不同大小时如何计算分类互熵?

How to calculate Categorical Cross-Entropy when logits and labels have different sizes?

logits = tf.constant([[-3.27133679, -22.6687183, -4.15501118, -5.14916372, -5.94609261,
                       -6.93373299, -5.72364092, -9.75725174, -3.15748906, -4.84012318],
                      [-11.7642536, -45.3370094, -3.17252636, 4.34527206, -17.7164974,
                      -0.595088899, -17.6322937, -2.36941719, -6.82157373, -3.47369862],
                      [-4.55468369, -1.07379043, -3.73261762, -7.08982277, -0.0288562477, 
                       -5.46847963, -0.979336262, -3.03667569, -3.29502845, -2.25880361]])
labels = tf.constant([2, 3, 4])

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True,
                                                            reduction='none')
loss = loss_object(labels, logits)
print(loss.numpy()) # [2.0077195  0.00928135 0.6800677 ]
print(tf.reduce_mean(loss).numpy()) # 0.8990229

我的意思是我怎样才能手工获得相同的结果([2.0077195 0.00928135 0.6800677 ])?

I mean how can I get the same result ([2.0077195 0.00928135 0.6800677 ]) by hand?

@OverLordGoldDragon答案是正确的.在TF 2.0中，它看起来像这样:

@OverLordGoldDragon answer is correct. In TF 2.0 it looks like this:

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True, reduction='none')
loss = loss_object(labels, logits)
print(f'{loss.numpy()}\n{tf.math.reduce_sum(loss).numpy()}')

one_hot_labels = tf.one_hot(labels, 10)

preds = tf.nn.softmax(logits)
preds /= tf.math.reduce_sum(preds, axis=-1, keepdims=True)
loss = tf.math.reduce_sum(tf.math.multiply(one_hot_labels, -tf.math.log(preds)), axis=-1)
print(f'{loss.numpy()}\n{tf.math.reduce_sum(loss).numpy()}')
# [2.0077195  0.00928135 0.6800677 ]
# 2.697068691253662
# [2.0077198  0.00928142 0.6800677 ]
# 2.697068929672241

对于语言模型:

vocab_size = 9
seq_len = 6
batch_size = 2

labels = tf.reshape(tf.range(batch_size*seq_len), (batch_size,seq_len)) # (2, 6)
logits = tf.random.normal((batch_size,seq_len,vocab_size)) # (2, 6, 9)

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(
    from_logits=True, reduction='none')
loss = loss_object(labels, logits)
print(f'{loss.numpy()}\n{tf.math.reduce_sum(loss).numpy()}')

one_hot_labels = tf.one_hot(labels, vocab_size)

preds = tf.nn.softmax(logits)
preds /= tf.math.reduce_sum(preds, axis=-1, keepdims=True)
loss = tf.math.reduce_sum(tf.math.multiply(one_hot_labels, -tf.math.log(preds)), axis=-1)
print(f'{loss.numpy()}\n{tf.math.reduce_sum(loss).numpy()}')
# [[1.341706  3.2518263 2.6482694 3.039099  1.5835983 4.3498387]
#  [2.67237   3.3978183 2.8657475       nan       nan       nan]]
# nan
# [[1.341706  3.2518263 2.6482694 3.039099  1.5835984 4.3498387]
#  [2.67237   3.3978183 2.8657475 0.        0.        0.       ]]
# 25.1502742767334

推荐答案

SparseCategoricalCrossentropy是CategoricalCrossentropy，它采用整数标签，而不是 one-hot . 源代码中的示例，以下两个是等效的:

SparseCategoricalCrossentropy is CategoricalCrossentropy that takes integer labels as opposed to one-hot. Example from source code, the two below are equivalent:

scce = tf.keras.losses.SparseCategoricalCrossentropy()
cce = tf.keras.losses.CategoricalCrossentropy()

labels_scce = K.variable([[0, 1, 2]]) 
labels_cce  = K.variable([[1,    0,  0], [0,    1,  0], [0,   0,   1]])
preds       = K.variable([[.90,.05,.05], [.50,.89,.60], [.05,.01,.94]])

loss_cce  = cce(labels_cce,   preds, from_logits=False)
loss_scce = scce(labels_scce, preds, from_logits=False)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run([loss_cce, loss_scce])

print(K.get_value(loss_cce))
print(K.get_value(loss_scce))
# [0.10536055  0.8046684  0.0618754]
# [0.10536055  0.8046684  0.0618754]

关于手动"操作方式，我们可以参考

As to how to do it 'by hand', we can refer to the Numpy backend:

np_labels = K.get_value(labels_cce)
np_preds  = K.get_value(preds)

losses = []
for label, pred in zip(np_labels, np_preds):
    pred /= pred.sum(axis=-1, keepdims=True)
    losses.append(np.sum(label * -np.log(pred), axis=-1, keepdims=False))
print(losses)
# [0.10536055  0.8046684  0.0618754]

from_logits = True:preds是模型输出之前传递给softmax(因此我们将其传递给softmax)

from_logits = False:preds是将模型输出传递到softmax之后的模型输出(因此我们跳过此步骤)

from_logits = True: preds is model output before passing it into softmax (so we pass it into softmax)

from_logits = False: preds is model output after passing it into softmax (so we skip this step)

因此，总的来说，要手动进行计算:

So in summary, to compute it by hand:

将整数标签转换为一键式标签

如果preds是softmax之前的模型输出，我们将计算其softmax

在计算日志之前，

pred /= ... 归一化预测；这样，高概率.倾向于零标签对一个标签进行 penalize 正确的预测.如果为from_logits = False，则此步骤为跳过，因为softmax进行了归一化.请参见此代码段. 进一步阅读
对于每个观察值/样本，仅按元素计算负值log(基本e) label==1

对所有观察结果取均值

Convert integer labels to one-hot labels

If preds are model outputs before softmax, we compute their softmax

pred /= ... normalizes predictions before computing logs; this way, high-probab. preds on zero-labels penalize correct predictions on one-labels. If from_logits = False, this step is skipped, since softmax does the normalization. See this snippet. Further reading

For each observation / sample, compute element-wise negative log (base e) only where label==1

Take mean of losses for all the observations

最后，分类交叉熵的数学公式为:

Lastly, the mathematical formula for categorical crossentropy is:

i遍历N个观测值

c遍历C类

1 是指标函数-在这里，就像二进制交叉熵，除了对长度为C的向量起作用

p_model [y_i \in C_c]-属于类c
的预测观察概率i

i iterates over N observations

c iterates over C classes

1 is the indicator function - here, like binary crossentropy, except operates on length-C vectors

p_model [y_i \in C_c] - predicted probability of observation i belonging to class c

这篇关于如何手动计算分类交叉熵?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何手动计算分类交叉熵? [英] How to calculate Categorical Cross-Entropy by hand?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何手动计算分类交叉熵? [英] How to calculate Categorical Cross-Entropy by hand?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭