在多类分类问题中,为什么二进制精度给出较高的精度,而分类精度给出较低的精度? [英] Why does binary accuracy give high accuracy while categorical accuracy give low accuracy, in a multi-class classification problem?

查看:497
本文介绍了在多类分类问题中,为什么二进制精度给出较高的精度,而分类精度给出较低的精度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Keras处理多类分类问题,并且正在使用二进制精度和分类精度作为度量.当我评估模型时,二进制精度得到了很高的值,而分类精度却得到了很低的值.我试图在自己的代码中重新创建二进制精度度量标准,但运气不佳.我的理解是,这是我需要重新创建的过程:

I'm working on a multiclass classification problem using Keras and I'm using binary accuracy and categorical accuracy as metrics. When I evaluate my model I get a really high value for the binary accuracy and quite a low one in for the categorical accuracy. I tried to recreate the binary accuracy metric in my own code but I am not having much luck. My understanding is that this is the process I need to recreate:

def binary_accuracy(y_true, y_pred):
     return K.mean(K.equal(y_true, K.round(y_pred)), axis=-1)

这是我的代码:

from keras import backend as K
preds = model.predict(X_test, batch_size = 128)

print preds
pos = 0.00
neg = 0.00

for i, val in enumerate(roundpreds):

    if val.tolist() == y_test[i]:
        pos += 1.0

    else: 
        neg += 1.0

print pos/(pos + neg)

但是,这给出的值比二进制精度给出的值低得多.二进制精度是否甚至是在多类问题中使用的适当度量?如果是这样,有人知道我要去哪里了吗?

But this gives a much lower value than the one given by binary accuracy. Is binary accuracy even an appropriate metric to be using in a multi-class problem? If so does anyone know where I am going wrong?

推荐答案

因此,您需要了解将binary_crossentropy应用于多类预测时会发生什么.

So you need to understand what happens when you apply a binary_crossentropy to a multiclass prediction.

  1. 让我们假设您从softmax输出的是(0.1, 0.2, 0.3, 0.4),并且一键编码的地面真实情况是(1, 0, 0, 0).
  2. binary_crossentropy屏蔽了所有高于0.5的输出,因此您的网络被转为(0, 0, 0, 0)向量.
  3. (0, 0, 0, 0)在4个索引中的3个上匹配真实情况(1, 0, 0, 0)-对于完全错误的答案,其结果精度为 75%
  1. Let's assume that your output from softmax is (0.1, 0.2, 0.3, 0.4) and one-hot encoded ground truth is (1, 0, 0, 0).
  2. binary_crossentropy masks all outputs which are higher than 0.5 so out of your network is turned to (0, 0, 0, 0) vector.
  3. (0, 0, 0, 0) matches ground truth (1, 0, 0, 0) on 3 out of 4 indexes - this makes resulting accuracy to be at the level of 75% for a completely wrong answer!

要解决此问题,您可以使用单个类别的准确性,例如像这样一个:

To solve this you could use a single class accuracy, e.g. like this one:

def single_class_accuracy(interesting_class_id):
    def fn(y_true, y_pred):
        class_id_preds = K.argmax(y_pred, axis=-1)
        # Replace class_id_preds with class_id_true for recall here
        positive_mask = K.cast(K.equal(class_id_preds, interesting_class_id), 'int32')
        true_mask = K.cast(K.equal(y_true, interesting_class_id), 'int32')
        acc_mask = K.cast(K.equal(positive_mask, true_mask), 'float32')
        class_acc = K.mean(acc_mask)
        return class_acc

    return fn

这篇关于在多类分类问题中,为什么二进制精度给出较高的精度,而分类精度给出较低的精度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆