Nltk朴素贝叶斯分类器内存问题 [英] Nltk naive bayesian classifier memory issue
问题描述
我的第一篇文章在这里! 我在使用nltk NaiveBayesClassifier时遇到问题.我有一套7000项训练课程.每个训练项目都有2个或3个世界的描述和一个代码.我想将代码用作类的标签,并将描述的每个世界用作功能. 一个例子:
my first post here! I have problems using the nltk NaiveBayesClassifier. I have a training set of 7000 items. Each training item has a description of 2 or 3 worlds and a code. I would like to use the code as label of the class and each world of the description as features. An example:
我叫奥巴马",001 ...
"My name is Obama", 001 ...
训练集= {[feature ['My'] = True,feature ['name'] = True,feature ['is'] = True,feature [Obama] = True],001}
Training set = {[feature['My']=True,feature['name']=True,feature['is']=True,feature[Obama]=True], 001}
不幸的是,使用这种方法,训练过程NaiveBayesClassifier.train最多使用3 GB的ram. 我的方法有什么问题? 谢谢!
Unfortunately, using this approach, the training procedure NaiveBayesClassifier.train use up to 3 GB of ram.. What's wrong in my approach? Thank you!
def document_features(document): # feature extractor
document = set(document)
return dict((w, True) for w in document)
...
words=set()
entries = []
train_set= []
train_length = 2000
readfile = open("atcname.pl", 'r')
t = readfile.readline()
while (t!=""):
t = t.split("'")
code = t[0] #class
desc = t[1] # description
words = words.union(s) #update dictionary with the new words in the description
entries.append((s,code))
t = readfile.readline()
train_set = classify.util.apply_features(document_features, entries[:train_length])
classifier = NaiveBayesClassifier.train(train_set) # Training
推荐答案
使用nltk.classify.apply_features
返回一个对象,该对象的作用类似于列表,但不将所有功能集存储在内存中.
Use nltk.classify.apply_features
which returns an object that acts like a list but does not store all the feature sets in memory.
from nltk.classify import apply_features
更多信息和示例此处
无论如何您都将文件加载到内存中,您将需要使用某种形式的延迟加载方法.将根据需要加载. 考虑研究此
You are loading the file anyway into the memory, you will need to use some form of lazy loading method. Which will load as per need basis. Consider looking into this
这篇关于Nltk朴素贝叶斯分类器内存问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!