Nltk朴素贝叶斯分类器内存问题 [英] Nltk naive bayesian classifier memory issue

查看：126 发布时间：2020/5/18 1:25:00 python nltk bayesian classification

本文介绍了Nltk朴素贝叶斯分类器内存问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的第一篇文章在这里！我在使用nltk NaiveBayesClassifier时遇到问题.我有一套7000项训练课程.每个训练项目都有2个或3个世界的描述和一个代码.我想将代码用作类的标签，并将描述的每个世界用作功能. 一个例子:

my first post here! I have problems using the nltk NaiveBayesClassifier. I have a training set of 7000 items. Each training item has a description of 2 or 3 worlds and a code. I would like to use the code as label of the class and each world of the description as features. An example:

我叫奥巴马"，001 ...

"My name is Obama", 001 ...

训练集= {[feature ['My'] = True，feature ['name'] = True，feature ['is'] = True，feature [Obama] = True]，001}

Training set = {[feature['My']=True,feature['name']=True,feature['is']=True,feature[Obama]=True], 001}

不幸的是，使用这种方法，训练过程NaiveBayesClassifier.train最多使用3 GB的ram. 我的方法有什么问题? 谢谢！

Unfortunately, using this approach, the training procedure NaiveBayesClassifier.train use up to 3 GB of ram.. What's wrong in my approach? Thank you!

def document_features(document): # feature extractor
document = set(document)
return dict((w, True) for w in document)

...
words=set()
entries = []
train_set= []
train_length = 2000
readfile = open("atcname.pl", 'r')
t = readfile.readline()
while (t!=""):
  t = t.split("'")
  code = t[0] #class
  desc = t[1] # description
  words = words.union(s) #update dictionary with the new words in the description
  entries.append((s,code))
  t = readfile.readline()
train_set = classify.util.apply_features(document_features, entries[:train_length])
classifier = NaiveBayesClassifier.train(train_set) # Training

Nltk朴素贝叶斯分类器内存问题 [英] Nltk naive bayesian classifier memory issue

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Nltk朴素贝叶斯分类器内存问题 [英] Nltk naive bayesian classifier memory issue

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭