面临 ValueError:目标是多类但平均 = 二进制 [英] Facing ValueError: Target is multiclass but average='binary'
问题描述
我是 Python 和机器学习的新手.根据我的要求,我正在尝试对我的数据集使用朴素贝叶斯算法.
I'm a newbie to python as well as machine learning. As per my requirement, I'm trying to use Naive Bayes algorithm for my dataset.
我能够找出准确度,但试图找出相同的精确度和召回率.但是,它抛出以下错误:
I'm able to find out the accuracy but trying to find out precision and recall for the same. But, it is throwing the following error:
"choose another average setting." % y_type)
ValueError: Target is multiclass but average='binary'. Please choose another average setting.
任何人都可以建议我如何进行.我尝试在精度和召回分数中使用 average ='micro'.它没有任何错误,但它在准确度、精确度和召回率方面给出了相同的分数.
Can anyone please suggest me how to proceed with it. I have tried using average ='micro' in the precision and the recall scores.It worked without any errors but it is giving the same score for accuracy, precision, recall.
review,label
Colors & clarity is superb,positive
Sadly the picture is not nearly as clear or bright as my 40 inch Samsung,negative
test_data.csv:
review,label
The picture is clear and beautiful,positive
Picture is not clear,negative
我的代码:
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import confusion_matrix
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
def load_data(filename):
reviews = list()
labels = list()
with open(filename) as file:
file.readline()
for line in file:
line = line.strip().split(',')
labels.append(line[1])
reviews.append(line[0])
return reviews, labels
X_train, y_train = load_data('/Users/abc/Sep_10/train_data.csv')
X_test, y_test = load_data('/Users/abc/Sep_10/test_data.csv')
vec = CountVectorizer()
X_train_transformed = vec.fit_transform(X_train)
X_test_transformed = vec.transform(X_test)
clf= MultinomialNB()
clf.fit(X_train_transformed, y_train)
score = clf.score(X_test_transformed, y_test)
print("score of Naive Bayes algo is :" , score)
y_pred = clf.predict(X_test_transformed)
print(confusion_matrix(y_test,y_pred))
print("Precision Score : ",precision_score(y_test,y_pred,pos_label='positive'))
print("Recall Score :" , recall_score(y_test, y_pred, pos_label='positive') )
推荐答案
您需要添加 'average'
参数.根据文档:
You need to add the 'average'
param. According to the documentation:
average : string, [None, ‘binary’ (default), ‘micro’, ‘macro’,‘样本’、‘加权’]
多类/多标签目标需要此参数.如果 None
,则返回每个班级的分数.否则,这确定对数据执行的平均类型:
This parameter is required for multiclass/multilabel targets. If None
, the
scores for each class are returned. Otherwise, this
determines the type of averaging performed on the data:
这样做:
print("Precision Score : ",precision_score(y_test, y_pred,
pos_label='positive'
average='micro'))
print("Recall Score : ",recall_score(y_test, y_pred,
pos_label='positive'
average='micro'))
将 'micro'
替换为上述任一选项,但 'binary'
除外.此外,在多类设置中,无需提供 'pos_label'
因为它无论如何都会被忽略.
Replace 'micro'
with any one of the above options except 'binary'
. Also, in the multiclass setting, there is no need to provide the 'pos_label'
as it will be anyways ignored.
评论更新:
是的,它们可以相等.它在用户指南中给出:
Yes, they can be equal. Its given in the user guide here:
请注意,对于多类设置中的微"平均,所有包含的标签将产生相同的精度、召回率和 F,而加权"平均可能会产生不介于准确率和召回率.
Note that for "micro"-averaging in a multiclass setting with all labels included will produce equal precision, recall and F, while "weighted" averaging may produce an F-score that is not between precision and recall.
这篇关于面临 ValueError:目标是多类但平均 = 二进制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!