Nltk朴素贝叶斯分类器内存问题 [英] Nltk naive bayesian classifier memory issue

查看:126
本文介绍了Nltk朴素贝叶斯分类器内存问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的第一篇文章在这里! 我在使用nltk NaiveBayesClassifier时遇到问题.我有一套7000项训练课程.每个训练项目都有2个或3个世界的描述和一个代码.我想将代码用作类的标签,并将描述的每个世界用作功能. 一个例子:

my first post here! I have problems using the nltk NaiveBayesClassifier. I have a training set of 7000 items. Each training item has a description of 2 or 3 worlds and a code. I would like to use the code as label of the class and each world of the description as features. An example:

我叫奥巴马",001 ...

"My name is Obama", 001 ...

训练集= {[feature ['My'] = True,feature ['name'] = True,feature ['is'] = True,feature [Obama] = True],001}

Training set = {[feature['My']=True,feature['name']=True,feature['is']=True,feature[Obama]=True], 001}

不幸的是,使用这种方法,训练过程NaiveBayesClassifier.train最多使用3 GB的ram. 我的方法有什么问题? 谢谢!

Unfortunately, using this approach, the training procedure NaiveBayesClassifier.train use up to 3 GB of ram.. What's wrong in my approach? Thank you!

def document_features(document): # feature extractor
document = set(document)
return dict((w, True) for w in document)

...
words=set()
entries = []
train_set= []
train_length = 2000
readfile = open("atcname.pl", 'r')
t = readfile.readline()
while (t!=""):
  t = t.split("'")
  code = t[0] #class
  desc = t[1] # description
  words = words.union(s) #update dictionary with the new words in the description
  entries.append((s,code))
  t = readfile.readline()
train_set = classify.util.apply_features(document_features, entries[:train_length])
classifier = NaiveBayesClassifier.train(train_set) # Training

推荐答案

使用nltk.classify.apply_features返回一个对象,该对象的作用类似于列表,但不将所有功能集存储在内存中.

Use nltk.classify.apply_features which returns an object that acts like a list but does not store all the feature sets in memory.

from nltk.classify import apply_features

更多信息和示例此处

无论如何您都将文件加载到内存中,您将需要使用某种形式的延迟加载方法.将根据需要加载. 考虑研究

You are loading the file anyway into the memory, you will need to use some form of lazy loading method. Which will load as per need basis. Consider looking into this

这篇关于Nltk朴素贝叶斯分类器内存问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆