为什么对同一问题,binary_crossentropy和categorical_crossentropy具有不同的性能? [英] Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?

查看:388
本文介绍了为什么对同一问题,binary_crossentropy和categorical_crossentropy具有不同的性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试训练CNN以按主题对文本进行分类.当我使用二进制交叉熵时,我获得了〜80%的准确度,而使用分类交叉熵时,我获得了〜50%的准确度.

I'm trying to train a CNN to categorize text by topic. When I use binary cross-entropy I get ~80% accuracy, with categorical cross-entropy I get ~50% accuracy.

我不明白为什么会这样.这是一个多类问题,这并不意味着我必须使用分类交叉熵,而具有二进制交叉熵的结果却毫无意义吗?

I don't understand why this is. It's a multiclass problem, doesn't that mean that I have to use categorical cross-entropy and that the results with binary cross-entropy are meaningless?

model.add(embedding_layer)
model.add(Dropout(0.25))
# convolution layers
model.add(Conv1D(nb_filter=32,
                    filter_length=4,
                    border_mode='valid',
                    activation='relu'))
model.add(MaxPooling1D(pool_length=2))
# dense layers
model.add(Flatten())
model.add(Dense(256))
model.add(Dropout(0.25))
model.add(Activation('relu'))
# output layer
model.add(Dense(len(class_id_index)))
model.add(Activation('softmax'))

然后使用categorical_crossentropy作为损失函数像这样编译它:

Then I compile it either it like this using categorical_crossentropy as the loss function:

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

从直觉上讲,为什么我要使用分类交叉熵是合理的,我不明白为什么使用二进制得到好的结果,而使用分类不好的结果.

Intuitively it makes sense why I'd want to use categorical cross-entropy, I don't understand why I get good results with binary, and poor results with categorical.

推荐答案

此明显的性能差异与分类和测试之间存在差异的原因@ xtof54已经在他的答案中报告了二进制交叉熵,即:

The reason for this apparent performance discrepancy between categorical & binary cross entropy is what @xtof54 has already reported in his answer, i.e.:

用Keras方法evaluate计算出的精度很简单 当使用带有超过2个标签的binary_crossentropy时出错

the accuracy computed with the Keras method evaluate is just plain wrong when using binary_crossentropy with more than 2 labels

我想对此进行详细说明,演示实际的潜在问题,对其进行解释,并提供补救措施.

I would like to elaborate more on this, demonstrate the actual underlying issue, explain it, and offer a remedy.

此行为不是错误;根本原因是一个相当微妙的&当您在模型编译中仅包含metrics=['accuracy']时,Keras如何实际猜测使用哪种精度(取决于您选择的损失函数)的未公开问题.换句话说,当您的第一个编译选项

This behavior is not a bug; the underlying reason is a rather subtle & undocumented issue at how Keras actually guesses which accuracy to use, depending on the loss function you have selected, when you include simply metrics=['accuracy'] in your model compilation. In other words, while your first compilation option

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

有效,您的第二个有效

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

不会产生您所期望的结果,但是原因不是使用二进制交叉熵(至少在原则上是绝对有效的损失函数).

will not produce what you expect, but the reason is not the use of binary cross entropy (which, at least in principle, is an absolutely valid loss function).

那是为什么?如果您查看指标源代码,则Keras不会定义单个精度度量标准,但有几个不同的度量标准,其中包括binary_accuracycategorical_accuracy. 在幕后发生的事情是: ,因为您选择了二进制交叉熵作为损失函数,并且未指定特定的精度指标,所以Keras(错误地...)推断您对binary_accuracy感兴趣,这就是它返回的内容-实际上,您对categorical_accuracy感兴趣.

Why is that? If you check the metrics source code, Keras does not define a single accuracy metric, but several different ones, among them binary_accuracy and categorical_accuracy. What happens under the hood is that, since you have selected binary cross entropy as your loss function and have not specified a particular accuracy metric, Keras (wrongly...) infers that you are interested in the binary_accuracy, and this is what it returns - while in fact you are interested in the categorical_accuracy.

使用 MNIST CNN示例在Keras中,进行了以下修改:

Let's verify that this is the case, using the MNIST CNN example in Keras, with the following modification:

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])  # WRONG way

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=2,  # only 2 epochs, for demonstration purposes
          verbose=1,
          validation_data=(x_test, y_test))

# Keras reported accuracy:
score = model.evaluate(x_test, y_test, verbose=0) 
score[1]
# 0.9975801164627075

# Actual accuracy calculated manually:
import numpy as np
y_pred = model.predict(x_test)
acc = sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000
acc
# 0.98780000000000001

score[1]==acc
# False    

要对此进行补救,即使用确实的二进制交叉熵作为损失函数(如我所说,至少在原则上没有错),同时仍能获得问题所需的绝对精度现在,您应该在模型编译中明确要求categorical_accuracy,如下所示:

To remedy this, i.e. to use indeed binary cross entropy as your loss function (as I said, nothing wrong with this, at least in principle) while still getting the categorical accuracy required by the problem at hand, you should ask explicitly for categorical_accuracy in the model compilation as follows:

from keras.metrics import categorical_accuracy
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=[categorical_accuracy])

在MNIST的示例中,经过如上所示的训练,评分和预测测试集之后,这两个指标现在应该相同,它们应该是相同的:

In the MNIST example, after training, scoring, and predicting the test set as I show above, the two metrics now are the same, as they should be:

# Keras reported accuracy:
score = model.evaluate(x_test, y_test, verbose=0) 
score[1]
# 0.98580000000000001

# Actual accuracy calculated manually:
y_pred = model.predict(x_test)
acc = sum([np.argmax(y_test[i])==np.argmax(y_pred[i]) for i in range(10000)])/10000
acc
# 0.98580000000000001

score[1]==acc
# True    

系统设置:

Python version 3.5.3
Tensorflow version 1.2.1
Keras version 2.0.4

更新:发布后,我发现此问题已在

UPDATE: After my post, I discovered that this issue had already been identified in this answer.

这篇关于为什么对同一问题,binary_crossentropy和categorical_crossentropy具有不同的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆