尝试更新gensim的LdaModel时出现IndexError [英] IndexError when trying to update gensim's LdaModel

查看：71 发布时间：2021/5/10 19:05:38 python-3.x gensim lda topic-modeling index-error

本文介绍了尝试更新gensim的LdaModel时出现IndexError的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当尝试更新gensim的 LdaModel 时，我遇到以下错误:

I am facing the following error when trying to update my gensim's LdaModel:

IndexError:索引6614超出了尺寸为6614的轴1的边界

IndexError: index 6614 is out of bounds for axis 1 with size 6614

I checked why were other people having this issue on this thread, but I am using the same dictionary from the beginning to the end, which was their error.

因为我有一个大数据集，所以我正在逐块加载(使用pickle.load).由于这段代码，我以这种方式迭代地构建字典:

As I have a big dataset, I am loading it chunk by chunk (using pickle.load). I am building the dictionary in this way, iteratively, thanks to this piece of code :

 fr_documents_lda = open("documents_lda_40_rails_30_ruby_full.dat", 'rb')
 dictionary = Dictionary()
 chunk_no = 0
 while 1:
     try:
         t0 = time()
         documents_lda = pickle.load(fr_documents_lda)
         chunk_no += 1
         dictionary.add_documents(documents_lda)
         t1 = time()
         print("Chunk number {0} took {1:.2f}s".format(chunk_no, t1-t0))
     except EOFError:
         print("Finished going through pickle")
         break

一旦为整个数据集构建好了，我就以这种方式迭代地训练模型:

Once built for the whole dataset, I am training the model in the same fashion, iteratively, this way :

fr_documents_lda = open("documents_lda_40_rails_30_ruby_full.dat", 'rb')
first_iter = True
chunk_no = 0
lda_gensim = None
while 1:
    try:
        t0 = time()
        documents_lda = pickle.load(fr_documents_lda) 
        chunk_no += 1
        corpus = [dictionary.doc2bow(text) for text in documents_lda]
        if first_iter:
            first_iter = False
            lda_gensim = LdaModel(corpus, num_topics=no_topics, iterations=100, offset=50., random_state=0, alpha='auto')
        else:
            lda_gensim.update(corpus)
        t1 = time()
        print("Chunk number {0} took {1:.2f}s".format(chunk_no, t1-t0))
    except EOFError:
        print("Finished going through pickle")
        break

我还尝试了在每个块上更新字典，即

I also tried updating the dictionary at every chunk, i.e. having

dictionary.add_documents(documents_lda)

就在之前

corpus = [dictionary.doc2bow(text) for text in documents_lda]

最后一段代码中的

.最后，我尝试将doc2bow的allow_update参数设置为True.什么都行不通.

in the last piece of code. Finally, I tried setting the allow_update argument of doc2bow to True. Nothing works.

仅供参考，我最后一本字典的大小为85k.仅从第一个块构建的字典的大小为10k.该错误在第二次迭代中通过调用else方法传递else条件时发生.

FYI, the size of my final dictionary is 85k. The size of my dictionary built only from the first chunk is 10k. The error occurs on the second iteration, when it passes in the else condition, when calling the update method.

该错误由行 expElogbetad = self.expElogbeta [:, ids] 引发，由 gamma调用，sstats = self.inference(chunk，collect_sstats = True)，其自身由 gammat = self.do_estep(chunk，other)自身，由 lda_gensim.update(corpus).

The error is raised by the line expElogbetad = self.expElogbeta[:, ids] , called by gamma, sstats = self.inference(chunk, collect_sstats=True), itself called by gammat = self.do_estep(chunk, other), itself called by lda_gensim.update(corpus).

有人对如何解决此问题有想法吗?

Is anyone having an idea on how to fix this, or what is happening ?

谢谢.

尝试更新gensim的LdaModel时出现IndexError [英] IndexError when trying to update gensim's LdaModel

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

尝试更新gensim的LdaModel时出现IndexError [英] IndexError when trying to update gensim&#39;s LdaModel

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

尝试更新gensim的LdaModel时出现IndexError [英] IndexError when trying to update gensim's LdaModel

登录关闭