如何使用混淆矩阵计算定制训练的spacy ner模型的整体准确性? [英] How to calculate the overall accuracy of custom trained spacy ner model with confusion matrix?

查看:65
本文介绍了如何使用混淆矩阵计算定制训练的spacy ner模型的整体准确性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试评估我的自定义训练的 Spacy NER 模型.如何使用混淆矩阵找到模型的整体精度.

I'm trying to evaluate my custom trained Spacy NER model. How to find the overall accuracy with confusion matrix for the model.

我尝试使用spacy评分器评估模型,该评分器具有以下参考,提供了准确性,召回率和令牌准确性,

I tried evaluating the model with spacy scorer which gives precision, recall and token accuracy with the below reference,

Spacy NER模型中的评估

我希望输出是混淆矩阵,而不是单个精度、召回率和标记准确率.

I expect the output in confusion matrix instead of individual precision, recall and token accuracy.

推荐答案

Here is a good read for creating Confusion Matrices for Spacy NER models. It is based on the BILOU format used by Spacy. It is good for small portions of text but when bigger documents are evaluated a Confusion Matrix is hard to read because most pieces of the text are O-labeled.

您可以做的是创建两个列表,一个列出每个单词的预测值,另一个列出每个单词的真实值,然后使用sklearn.metrics.confusion_matrix()函数进行比较.

What you can do is create two lists, one with predicted values per word and one with the true values per word and compare those using the sklearn.metrics.confusion_matrix() function.

from sklearn.metrics import confusion_matrix
y_true = [O,O,O,B-PER,I-PER]
y_pred = [O,O,O,B-PER,O]
confusion_matrix(y_true, y_pred, labels=["O", "B-PER", "I-PER"])

您还可以使用同一库中的plot_confusion_matrix()函数获得视觉输出,但是这需要scikit-learn 0.23.1或更高版本,并且仅可用于scikit-learn分类器.

You can also use the plot_confusion_matrix() function from the same library to get a visual output, however this requires scikit-learn 0.23.1 or above and is only usable with scikit-learn classifiers.

stackoverflow问题中所述,这是一种使用方式scikit-learn中的confusion_matrix()无需绘制图.

As written in this stackoverflow question, this is a way to use the confusion_matrix() from scikit-learn without their plot.

from sklearn.metrics import confusion_matrix

labels = ['business', 'health']
cm = confusion_matrix(y_test, pred, labels)
print(cm)
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(cm)
plt.title('Confusion matrix of the classifier')
fig.colorbar(cax)
ax.set_xticklabels([''] + labels)
ax.set_yticklabels([''] + labels)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()

这篇关于如何使用混淆矩阵计算定制训练的spacy ner模型的整体准确性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆