多标签,多类别准确性:如何计算多标签,多类别标签的准确性? [英] Multilabel, Multiclass accuracy : how to calculate accuracy for Multiclass, Multilabel classification?

查看:2179
本文介绍了多标签,多类别准确性:如何计算多标签,多类别标签的准确性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究多标签和多类分类框架,我想添加矩阵以进行多标签和多类准确性计算.

I am working on a multilabel and multiclass classification framework, I want to add matrices for multilabel and multiclass accuracy calculation.

这是演示数据:

predicted_labels = [[1,0,0,0,1],[1,0,0,0,1],[1,0,0,0,1],[1,0,0,0,1],[1,0,0,0,1],[1,0,1,0,1]]
true_labels      = [[1,1,0,0,1],[1,0,0,1,1],[1,0,0,0,1],[1,1,1,0,1],[1,0,0,0,1],[1,0,0,0,1]]

用于多标签,多类别分类的最流行的准确性矩阵是:

Most popular accuracy matrices for multi-label, multi-class classification are :

  1. 击剑得分
  2. 伤害损失
  3. 子集准确性

以上三个代码是:

def hamming_score(y_true, y_pred, normalize=True, sample_weight=None):
    '''
    Compute the Hamming score (a.k.a. label-based accuracy) for the multi-label case

    '''
    acc_list = []
    for i in range(y_true.shape[0]):
        set_true = set( np.where(y_true[i])[0] )
        set_pred = set( np.where(y_pred[i])[0] )
        #print('\nset_true: {0}'.format(set_true))
        #print('set_pred: {0}'.format(set_pred))
        tmp_a = None
        if len(set_true) == 0 and len(set_pred) == 0:
            tmp_a = 1
        else:
            tmp_a = len(set_true.intersection(set_pred))/\
                    float( len(set_true.union(set_pred)) )
        #print('tmp_a: {0}'.format(tmp_a))
        acc_list.append(tmp_a)

    return  { 'hamming_score' : np.mean(acc_list) , 
              'subset_accuracy' : sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None), 
              'hamming_loss' : sklearn.metrics.hamming_loss(y_true, y_pred)}

但是我一直在寻找f1-score用于多标签分类,所以我尝试使用sklearn f1-score:

But I was looking for f1-score for multilabel classification so I tried to use sklearn f1-score :

print(f1_score(demo, true, average='micro'))

但是它给了我错误:

> ValueError: multiclass-multioutput is not supported

我将数据转换为np数组,并再次使用f1_score:

I converted the data into np array and use f1_score again:

print(f1_score(np.array(true_labels),np.array(predicted_labels), average='micro'))

然后我得到了准确性:

0.8275862068965517

我又尝试了一次实验,我使用了真实和预测标签中的一个示例,并在其上使用了f1分数,然后取其平均值:

I tried one more experiment, I used one-one example from true and predicted labels and used f1-score over that and then took the mean of that :

accuracy_score = []

for tru,pred in zip (true_labels, predicted_labels):
    accuracy_score.append(f1_score(tru,pred,average='micro'))

print(np.mean(accuracy_score))

输出:

0.8333333333333335

精度不同

为什么它不在列表列表上工作,而是在np数组上工作,哪种方法是正确的,以一个示例一个均值并取平均值或对所有样本使用numpy数组呢?

Why it's not working on list of list but working on np array and which method is correct, taking one by one example and mean or using numpy array with all samples?

还有哪些其他矩阵可用于多标签分类准确性计算?

What other matrices are available for multilabel classification accuracy calculation?

推荐答案

您可以检查 answer 和其他已经讨论过的答案.

You can check this answer and other answers which is already discussed.

这篇关于多标签,多类别准确性:如何计算多标签,多类别标签的准确性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆