使用Gensim软件包进行LDA主题建模时出现IndexError [英] IndexError while using Gensim package for LDA Topic Modelling

查看:391
本文介绍了使用Gensim软件包进行LDA主题建模时出现IndexError的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我总共有54892个文档,其中包含360331个唯一标记.字典的长度是88.

I have a total of 54892 documents which have 360331 unique tokens. The length of the dictionary is 88.

mm = corpora.MmCorpus('PRC.mm')
dictionary = corpora.Dictionary('PRC.dict')
lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=dictionary, num_topics=50, update_every=0, chunksize=19188, passes=650)

每当我运行此脚本时,都会出现此错误:

Whenever I run this script I get this error:

Traceback (most recent call last):
File "C:\Users\modelDeTopics.py", line 19, in <module>
lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=dictionary, num_topics=50, update_every=0, chunksize=19188, passes=650)
File "C:\Python27\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\models\ldamodel.py", line 265, in __init__
self.update(corpus)
File "C:\Python27\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\models\ldamodel.py", line 445, in update
self.do_estep(chunk, other)
File "C:\Python27\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\models\ldamodel.py", line 365, in do_estep
gamma, sstats = self.inference(chunk, collect_sstats=True)
File "C:\Python27\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\models\ldamodel.py", line 318, in inference
expElogbetad = self.expElogbeta[:, ids]
IndexError: index 8 is out of bounds for axis 1 with size 8

我在Internet上进行检查,提到我可能与计算机具有的RAM有关.我正在使用Windows 7 32位和4 GB RAM.我应该在脚本中进行哪些更改?

I check on the internet, it is mentioned that i might be related to the RAM the computer has. I am using Windows 7 32-bit with 4 GB RAM. What change should I make in the script?

请帮助!

推荐答案

您的dictionary似乎有问题. 88个独特的单词听起来不合理.

Looks like a problem with your dictionary. 88 unique words doesn't sound reasonable.

发布完整日志将揭示更多信息.

Posting a full log would reveal more.

这篇关于使用Gensim软件包进行LDA主题建模时出现IndexError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆