gensim/models/ldaseqmodel.py:217:RuntimeWarning:在double_scalars中除以零 [英] gensim/models/ldaseqmodel.py:217: RuntimeWarning: divide by zero encountered in double_scalars

查看:90
本文介绍了gensim/models/ldaseqmodel.py:217:RuntimeWarning:在double_scalars中除以零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

/Users/Barry/anaconda/lib/python2.7/site-packages/gensim/models/ldaseqmodel.py:217:RuntimeWarning:除以double_scalars中的零收敛= np.fabs((bound-old_bound)/old_bound)

/Users/Barry/anaconda/lib/python2.7/site-packages/gensim/models/ldaseqmodel.py:217: RuntimeWarning: divide by zero encountered in double_scalars convergence = np.fabs((bound - old_bound) / old_bound)

#dynamic topic model
def run_dtm(num_topics=18):
    docs, years, titles = preprocessing(datasetType=2)

    #resort document by years
    Z = zip(years, docs)
    Z = sorted(Z, reverse=False)
    years_new, docs_new = zip(*Z)

    #generate time slice
    time_slice = Counter(years_new).values()

    for year in Counter(years_new):
        print year,' --- ',Counter(years_new)[year]

    print '********* data set loaded ********'
    dictionary = corpora.Dictionary(docs_new)
    corpus = [dictionary.doc2bow(text) for text in docs_new]

    print '********* train lda seq model ********'
    ldaseq = ldaseqmodel.LdaSeqModel(corpus=corpus, id2word=dictionary, time_slice=time_slice, num_topics=num_topics)

    print '********* lda seq model done ********'
    ldaseq.print_topics(time=1)

大家好,我正在使用gensim包中的动态主题模型进行主题分析,并按照本教程

Hey guys, I'm using the dynamic topic models in gensim package for topic analysis, following this tutorial, https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/ldaseqmodel.ipynb, however I always got the same unexpected error. Can anyone give me some guidance? I'm really puzzled even thought I have tried some different dataset for generating corpus and dictionary. The error is like this:

/Users/Barry/anaconda/lib/python2.7/site-packages/gensim/models/ldaseqmodel.py:217:RuntimeWarning:除以double_scalars中的零收敛= np.fabs((bound-old_bound)/old_bound)

/Users/Barry/anaconda/lib/python2.7/site-packages/gensim/models/ldaseqmodel.py:217: RuntimeWarning: divide by zero encountered in double_scalars convergence = np.fabs((bound - old_bound) / old_bound)

推荐答案

这是 ldaseqmodel.py 本身的源代码存在的问题.对于最新的 gensim 软件包(版本3.8.3 ),我在第293行遇到相同的错误:

This is the issue with the source code of ldaseqmodel.py itself. For the latest gensim package(version 3.8.3) I am getting the same error at line 293:

ldaseqmodel.py:293: RuntimeWarning: divide by zero encountered in double_scalars
  convergence = np.fabs((bound - old_bound) / old_bound)

现在,如果您遍历代码,您将看到以下内容:在此处输入图片描述

Now, if you go through the code you will see this: enter image description here

您可以看到,这里他们将 bound old_bound 之间的差除以 old_bound (从警告中也可以看到)

You can see that here they divide the difference between bound and old_bound by the old_bound(which is also visible from the warning)

现在,如果您进一步分析,您将在第263行看到 old_bound 初始化为 zero ,这是收到此除以零遇到的情况.

Now if you analyze further you will see that at line 263, the old_bound is initialized with zero and this is the main reason that you are getting this warning of divide by zero encountered.

在此处输入图片描述

有关更多信息,我在第294行放置了打印声明:

For further information, I put a print statement at line 294:

print('bound = {}, old_bound = {}'.format(bound, old_bound))

我收到的输出是:在此处输入图像描述

因此,在一行中,您收到此警告是由于软件包 ldaseqmodel.py 的源代码,而不是因为有任何空文档.尽管如果您不从语料库中删除空文档,您将收到另一个警告.因此,我建议您的语料库中是否有任何空文档,请将其删除,而忽略上述被零除的警告.

So, in a single line you are getting this warning because of the source code of the package ldaseqmodel.py not because of any empty document. Although if you do not remove the empty documents from your corpus you will receive another warning. So I suggest if there are any empty documents in your corpus remove them and just ignore the above warning of division by zero.

这篇关于gensim/models/ldaseqmodel.py:217:RuntimeWarning:在double_scalars中除以零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆