gensim.interfaces.TransformedCorpus-如何使用? [英] gensim.interfaces.TransformedCorpus - How use?
问题描述
我在潜在的Dirichlet分配领域相对较新. 我可以按照Wikipedia教程生成LDA模型,也可以使用自己的文档生成LDA模型. 现在,我的步骤是尝试了解如何使用之前生成的模型对看不见的文档进行分类. 我将"lda_wiki_model"保存为
I'm relative new in the world of Latent Dirichlet Allocation. I am able to generate a LDA Model following the Wikipedia tutorial and I'm able to generate a LDA model with my own documents. My step now is try understand how can I use a previus generated model to classify unseen documents. I'm saving my "lda_wiki_model" with
id2word =gensim.corpora.Dictionary.load_from_text('ptwiki_wordids.txt.bz2')
mm = gensim.corpora.MmCorpus('ptwiki_tfidf.mm')
lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=100, update_every=1, chunksize=10000, passes=1)
lda.save('lda_wiki_model.lda')
我正在用以下方法加载相同的模型:
And I'm loading the same model with:
new_lda = gensim.models.LdaModel.load(path + 'lda_wiki_model.lda') #carrega o modelo
我有一个"new_doc.txt",我将文档转换为id <->术语词典,并将此标记化文档转换为文档术语矩阵"
I have a "new_doc.txt", and I turn my document into a id<-> term dictionary and converted this tokenized document to "document-term matrix"
但是当我运行new_topics = new_lda[corpus]
时,我收到一个
'gensim.interfaces.TransformedCorpus对象位于0x7f0ecfa69d50'
But when I run new_topics = new_lda[corpus]
I receive a
'gensim.interfaces.TransformedCorpus object at 0x7f0ecfa69d50'
如何从中提取主题?
我已经尝试过
`lsa = models.LdaModel(new_topics, id2word=dictionary, num_topics=1, passes=2)
corpus_lda = lsa[new_topics]
print(lsa.print_topics(num_topics=1, num_words=7)
和
print(corpus_lda.print_topics(num_topics=1, num_words=7
)
`
,但是返回的主题与我的新文档无关. 我的错误在哪里?我想念一些东西吗?
but that return topics not relationed to my new document. Where is my mistake? I'm miss understanding something?
**如果使用上面创建的词典和语料库运行新模型,我会收到正确的主题,我的意思是:如何重用我的模型?是否可以正确地重新使用该wiki_model?
**If a run a new model using the dictionary and corpus created above, I receive the correct topics, my point is: how re-use my model? is correctly re-use that wiki_model?
谢谢.
推荐答案
我遇到了同样的问题.这段代码将解决您的问题:
I was facing the same problem. This code will solve your problem:
new_topics = new_lda[corpus]
for topic in new_topics:
print(topic)
这将为您提供一个形式为元组的列表(主题编号,概率)
This will give you a list of tuples of form (topic number, probability)
这篇关于gensim.interfaces.TransformedCorpus-如何使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!