NLTK准确性:"ValueError:太多值无法解包". [英] NLTK accuracy: "ValueError: too many values to unpack"

查看:103
本文介绍了NLTK准确性:"ValueError:太多值无法解包".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用NLTK工具包对Twitter上的一部新电影进行情感分析.我遵循了NLTK"movie_reviews"示例,并建立了自己的CategorizedPlaintextCorpusReader对象.当我调用nltk.classify.util.accuracy(classifier, testfeats)时出现问题.这是代码:

I'm trying to do some sentiment analysis of a new movie from Twitter using the NLTK toolkit. I've followed the NLTK 'movie_reviews' example and I've built my own CategorizedPlaintextCorpusReader object. The problem arises when I call nltk.classify.util.accuracy(classifier, testfeats). Here is the code:

import os
import glob
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews

def word_feats(words):
        return dict([(word, True) for word in words])

negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]

trainfeats = negfeats + posfeats

# Building a custom Corpus Reader
tweets = nltk.corpus.reader.CategorizedPlaintextCorpusReader('./tweets', r'.*\.txt', cat_pattern=r'(.*)\.txt')
tweetsids = tweets.fileids()
testfeats = [(word_feats(tweets.words(fileids=[f]))) for f in tweetsids]

print 'Training the classifier'
classifier = NaiveBayesClassifier.train(trainfeats)

for tweet in tweetsids:
        print tweet + ' : ' + classifier.classify(word_feats(tweets.words(tweetsids)))

classifier.show_most_informative_features()

print 'accuracy:', nltk.classify.util.accuracy(classifier, testfeats)

在到达最后一行之前,一切似乎都可以正常工作.那就是我得到错误的地方:

It all seems to work fine until it gets to the last line. That's when I get the error:

>>> nltk.classify.util.accuracy(classifier, testfeats)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/nltk/classify/util.py", line 87, in accuracy
    results = classifier.classify_many([fs for (fs,l) in gold])
ValueError: too many values to unpack

有人在代码中看到任何错误吗?

Does anybody see anything wrong within the code?

谢谢.

推荐答案

错误消息

File "/usr/lib/python2.7/dist-packages/nltk/classify/util.py", line 87, in accuracy
  results = classifier.classify_many([fs for (fs,l) in gold])
ValueError: too many values to unpack

之所以会出现问题,是因为无法将gold中的项目解压缩为2个元组(fs,l):

arises because items in gold can not be unpacked into a 2-tuple, (fs,l):

[fs for (fs,l) in gold]  # <-- The ValueError is raised here

如果gold等于[(1,2,3)],则将得到相同的错误,因为3元组(1,2,3)无法解包为2元组(fs,l):

It is the same error you would get if gold equals [(1,2,3)], since the 3-tuple (1,2,3) can not be unpacked into a 2-tuple (fs,l):

In [74]: [fs for (fs,l) in [(1,2)]]
Out[74]: [1]
In [73]: [fs for (fs,l) in [(1,2,3)]]
ValueError: too many values to unpack

gold可能埋在nltk.classify.util.accuracy的实现中,但这暗示您输入的classifiertestfeats具有错误的形状".

gold might be buried inside the implementation of nltk.classify.util.accuracy, but this hints that your inputs, classifier or testfeats are of the wrong "shape".

分类器没有问题,因为调用accuracy(classifier, trainfeats) 作品:

There is no problem with classifer, since calling accuracy(classifier, trainfeats) works:

In [61]: print 'accuracy:', nltk.classify.util.accuracy(classifier, trainfeats)
accuracy: 0.9675

问题必须出在testfeats.

trainfeatstestfeats进行比较. trainfeats[0]是一个包含字典和分类的2元组:

Compare trainfeats with testfeats. trainfeats[0] is a 2-tuple containing a dict and a classification:

In [63]: trainfeats[0]
Out[63]: 
({u'!': True,
  u'"': True,
  u'&': True,
  ...
  u'years': True,
  u'you': True,
  u'your': True},
 'neg')           # <---  Notice the classification, 'neg'

但是testfeats[0]只是一个字典,word_feats(tweets.words(fileids=[f])):

testfeats = [(word_feats(tweets.words(fileids=[f]))) for f in tweetsids]

因此,要解决此问题,您需要定义testfeats使其看起来更像trainfeats-word_feats返回的每个字典都必须与分类配对.

So to fix this you would need to define testfeats to look more like trainfeats -- each dict returned by word_feats must be paired with a classification.

这篇关于NLTK准确性:"ValueError:太多值无法解包".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆