在 NLTK 中显示标签概率/置信度 [英] Show label probability/confidence in NLTK

查看:92
本文介绍了在 NLTK 中显示标签概率/置信度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Python NLTK 库中的 MaxEnt 分类器.对于我的数据集,我有很多可能的标签,正如预期的那样,MaxEnt 只返回一个标签.我已经训练了我的数据集并获得了大约 80% 的准确率.我还在未知数据项上测试了我的模型,结果很好.但是,对于任何给定的未知输入,我希望能够根据 MaxEnt 用于选择标签的某些内部标准(例如置信度/概率)打印/显示所有可能标签的排名.例如,假设我有 a,b,c 作为可能的标签并且我使用 MaxEnt.classify(input),我目前得到一个标签,假设 c.但是,我希望能够查看诸如 a (0.9)、b(0.7)、c(0.92) 之类的内容,因此我可以了解为什么选择了 c,并且可能会根据这些参数选择多个标签.抱歉我的术语含糊不清,我对 NLP 和机器学习还很陌生.

I'm using the MaxEnt classifier from the Python NLTK library. For my dataset, I have many possible labels, and as expected, MaxEnt returns just one label. I have trained my dataset and get about 80% accuracy. I've also tested my model on unknown data items, and the results are good. However, for any given unknown input, I want to be able to print/display a ranking of all the possible labels based on some internal criteria MaxEnt used to select the one, such as confidence/probability. For example, suppose I had a,b,c as possible labels and I use MaxEnt.classify(input), I get currently one label, let's say c. However, I want to be able to view something like a (0.9), b(0.7), c(0.92), so I can see why c was selected, and possibly choose multiple labels based on those parameters. Apologies for my fuzzy terminology, I'm fairly new to NLP and machine learning.

解决方案

基于接受的答案,这里有一个框架代码示例来演示我想要什么以及如何实现.NLTK 网站上的更多分类器示例.

Based on the accepted answer, here's a skeleton code example to demonstrate what I wanted and how it can be achieved. More classifier examples on the NLTK website.

import nltk

contents = read_data('mydataset.csv')
data_set = [(feature_sets(input), label) for (label, input) in contents] # User-defined feature_sets() function
train_set, test_set = data_set[:1000], data_set[1000:]
labels = [label for (input, label) in train_set]
maxent = nltk.MaxentClassifier.train(train_set)
maxent.classify(feature_sets(new_input)) # Returns one label
multi_label = maxent.prob_classify(feature_sets(new_input)) # Returns a DictionaryProbDist object
for label in labels:
    multi_label.prob(label)

推荐答案

Try prob_classify(input)

它返回每个标签的概率字典,请参阅docs.

It returns dictionary with probability for each label, see docs.

这篇关于在 NLTK 中显示标签概率/置信度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆