面临 ValueError:目标是多类但平均 = 二进制 [英] Facing ValueError: Target is multiclass but average='binary'

查看:32
本文介绍了面临 ValueError:目标是多类但平均 = 二进制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Python 和机器学习的新手.根据我的要求,我正在尝试对我的数据集使用朴素贝叶斯算法.

I'm a newbie to python as well as machine learning. As per my requirement, I'm trying to use Naive Bayes algorithm for my dataset.

我能够找出准确度,但试图找出相同的精确度和召回率.但是,它抛出以下错误:

I'm able to find out the accuracy but trying to find out precision and recall for the same. But, it is throwing the following error:

   "choose another average setting." % y_type)
ValueError: Target is multiclass but average='binary'. Please choose another average setting.

任何人都可以建议我如何进行.我尝试在精度和召回分数中使用 average ='micro'.它没有任何错误,但它在准确度、精确度和召回率方面给出了相同的分数.

Can anyone please suggest me how to proceed with it. I have tried using average ='micro' in the precision and the recall scores.It worked without any errors but it is giving the same score for accuracy, precision, recall.

review,label
Colors & clarity is superb,positive
Sadly the picture is not nearly as clear or bright as my 40 inch Samsung,negative

test_data.csv:

review,label
The picture is clear and beautiful,positive
Picture is not clear,negative

我的代码:

from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import confusion_matrix
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score


def load_data(filename):
    reviews = list()
    labels = list()
    with open(filename) as file:
        file.readline()
        for line in file:
            line = line.strip().split(',')
            labels.append(line[1])
            reviews.append(line[0])

    return reviews, labels

X_train, y_train = load_data('/Users/abc/Sep_10/train_data.csv')
X_test, y_test = load_data('/Users/abc/Sep_10/test_data.csv')

vec = CountVectorizer() 

X_train_transformed =  vec.fit_transform(X_train) 

X_test_transformed = vec.transform(X_test)

clf= MultinomialNB()
clf.fit(X_train_transformed, y_train)

score = clf.score(X_test_transformed, y_test)
print("score of Naive Bayes algo is :" , score)

y_pred = clf.predict(X_test_transformed)
print(confusion_matrix(y_test,y_pred))

print("Precision Score : ",precision_score(y_test,y_pred,pos_label='positive'))
print("Recall Score :" , recall_score(y_test, y_pred, pos_label='positive') )

推荐答案

您需要添加 'average' 参数.根据文档:

You need to add the 'average' param. According to the documentation:

average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’,‘样本’、‘加权’]

多类/多标签目标需要此参数.如果 None,则返回每个班级的分数.否则,这确定对数据执行的平均类型:

This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

这样做:

print("Precision Score : ",precision_score(y_test, y_pred, 
                                           pos_label='positive'
                                           average='micro'))
print("Recall Score : ",recall_score(y_test, y_pred, 
                                           pos_label='positive'
                                           average='micro'))

'micro' 替换为上述任一选项,但 'binary' 除外.此外,在多类设置中,无需提供 'pos_label' 因为它无论如何都会被忽略.

Replace 'micro' with any one of the above options except 'binary'. Also, in the multiclass setting, there is no need to provide the 'pos_label' as it will be anyways ignored.

评论更新:

是的,它们可以相等.它在用户指南中给出:

Yes, they can be equal. Its given in the user guide here:

请注意,对于多类设置中的微"平均,所有包含的标签将产生相同的精度、召回率和 F,而加权"平均可能会产生不介于准确率和召回率.

Note that for "micro"-averaging in a multiclass setting with all labels included will produce equal precision, recall and F, while "weighted" averaging may produce an F-score that is not between precision and recall.

这篇关于面临 ValueError:目标是多类但平均 = 二进制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆