如何从nltk分类器获取精度和召回率? [英] How to get the precision and recall from a nltk classifier?

查看:463
本文介绍了如何从nltk分类器获取精度和召回率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import nltk
from nltk.corpus import movie_reviews
from nltk.tokenize import word_tokenize

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]


all_words = []

for w in movie_reviews.words():
    all_words.append(w.lower())

all_words = nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:3000]

def find_features(document):
    words = set(document)
    features = {}
    for w in word_features:
        features[w] = (w in words)

    return features

featuresets = [(find_features(rev), category) for (rev, category) in documents]

training_set = featuresets[500:1500]
testing_set = featuresets[:1500]

classifier = nltk.DecisionTreeClassifier.train(training_set)

print "Classifier accuracy percent:",(nltk.classify.accuracy(classifier, testing_set))*100 , "%"

string = raw_input("Enter the string: ")
print (classifier.classify(find_features(word_tokenize(string))))

此代码将显示分类器的准确性,然后从用户那里获取输入.并返回用户输入的字符串的极性.

This code will display the accuracy of the classifier and then get input from user. And it returns the polarity of the string input by the user.

但这是我的问题:由于我可以通过使用nltk.accuracy()来获得准确性,因此是否有可能获得其准确性和召回率?

But here's my question: since I can obtain the accuracy by using nltk.accuracy(), is it possible to get its precision and recall as well?

推荐答案

如果您使用的是nltk软件包,那么您似乎可以使用nltk.metrics.scores中的recallprecision函数(请参阅文档).

If you're using the nltk package, then it appears you can use the recall and precision functions from nltk.metrics.scores (see the docs).

调用后功能应该可用

from nltk.metrics.scores import (precision, recall)

然后,您需要使用reference(已知标签)和test(测试集上分类器的输出)集来调用它们.

Then you need to call them with reference (known labels) and test (the output of your classifier on the test set) sets.

类似于以下代码的东西应该将这些集合生成为refsetstestsets

Something like the code below should produce these sets as refsets and testsets

refsets = collections.defaultdict(set)
testsets = collections.defaultdict(set)

for i, (feats, label) in enumerate(testing_set):
    refsets[label].add(i)
    observed = classifier.classify(feats)
    testsets[observed].add(i)

然后,您可以使用

print 'Precision:', nltk.metrics.precision(refsets['pos'], testsets['pos'])
print 'Recall:', nltk.metrics.recall(refsets['pos'], testsets['pos'])

这篇关于如何从nltk分类器获取精度和召回率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆