Python:已加载的NLTK分类器不起作用 [英] Python: Loaded NLTK Classifier not working

查看:110
本文介绍了Python:已加载的NLTK分类器不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试训练NLTK分类器进行情感分析,然后使用pickle保存分类器. 新近训练的分类器工作正常.但是,如果我加载保存的分类器,则所有示例的分类器将输出正"或负".

I'm trying to train a NLTK classifier for sentiment analysis and then save the classifier using pickle. The freshly trained classifier works fine. However, if I load a saved classifier the classifier will either output 'positive', or 'negative' for ALL examples.

我正在使用保存分类器

classifier = nltk.NaiveBayesClassifier.train(training_set)
classifier.classify(words_in_tweet)
f = open('classifier.pickle', 'wb')
pickle.dump(classifier, f)
f.close()

并使用

f = open('classifier.pickle', 'rb')
classifier = pickle.load(f)
f.close()
classifier.classify(words_in_tweet)

我没有收到任何错误. 知道可能是什么问题,或者如何正确调试它?

I'm not getting any errors. Any idea what the problem could be, or how to debug this correctly?

推荐答案

腌制的分类器最有可能出错的地方是特征提取功能.必须使用它来生成分类器使用的特征向量.

The most likely place a pickled classifier can go wrong is with the feature extraction function. This must be used to generate the feature vectors that the classifier works with.

NaiveBayesClassifier期望用于训练和分类的特征向量;您的代码看起来就像是将原始单词传递给了分类器一样(但大概只是在解腌之后,否则解腌前后不会有不同的行为).您应该将特征提取代码存储在一个单独的文件中,并在训练脚本和分类(或测试)脚本中分别import.

The NaiveBayesClassifier expects feature vectors for both training and classification; your code looks as if you passed the raw words to the classifier instead (but presumably only after unpickling, otherwise you wouldn't get different behavior before and after unpickling). You should store the feature extraction code in a separate file, and import it in both the training and the classifying (or testing) script.

我怀疑这是否适用于OP,但是某些NLTK分类器将特征提取功能作为构造函数的参数.当您有用于训练和分类的单独脚本时,要确保未腌制的分类器成功找到相同的功能可能会很棘手.这是因为pickle的工作方式:酸洗仅保存数据,而不保存代码.要使其正常工作,只需将提取功能放在脚本导入的单独文件(模块)中.如果输入"main"脚本,则pickle.load会在错误的位置查找它.

I doubt this applies to the OP, but some NLTK classifiers take the feature extraction function as an argument to the constructor. When you have separate scripts for training and classifying, it can be tricky to ensure that the unpickled classifier successfully finds the same function. This is because of the way pickle works: pickling only saves data, not code. To get it to work, just put the extraction function in a separate file (module) that your scripts import. If you put in in the "main" script, pickle.load will look for it in the wrong place.

这篇关于Python:已加载的NLTK分类器不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆