如何从gensim打印LDA主题模型?Python [英] How to print the LDA topics models from gensim? Python

查看:52
本文介绍了如何从gensim打印LDA主题模型?Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 gensim 我能够从 LSA 中的一组文档中提取主题,但如何访问从 LDA 模型生成的主题?

打印 lda.print_topics(10) 时,代码给出以下错误,因为 print_topics() 返回 NoneType:

回溯(最近一次调用最后一次): 中的文件/home/alvas/workspace/XLINGTOP/xlingtop.py",第 93 行对于 lda.print_topics(2) 中的顶部:TypeError: 'NoneType' 对象不可迭代

代码:

from gensim 导入语料库、模型、相似点从 gensim.models 导入 hdpmodel, ldamodel从 itertools 导入 izip文档 = [实验室 abc 计算机应用程序的人机界面",《用户对计算机系统响应时间意见的调查》,《EPS用户界面管理系统》,《EPS的系统与人体系统工程测试》,用户感知响应时间与错误测量的关系",《随机二叉无序树的生成》,"树中路径的交集图",图未成年人IV树的宽度和井准排序",《图未成年人调查》]# 删除常用词并标记化stoplist = set('for a of the and to in'.split())texts = [[word for word in document.lower().split() 如果单词不在停止列表中]对于文档中的文档]# 删除只出现一次的单词all_tokens = sum(texts, [])tokens_once = set(word for word in set(all_tokens) if all_tokens.count(word) == 1)texts = [[如果单词不在tokens_once中,则在文本中逐字逐句]对于文本中的文本]字典 = corpora.Dictionary(texts)corpus = [dictionary.doc2bow(text) for texts]# 我可以打印出 LSA 的主题lsi = models.LsiModel(corpus_tfidf, id2word=dictionary, num_topics=2)corpus_lsi = lsi[语料库]对于 izip(corpus_lsi,corpus) 中的 l,t:打印 l,"#",t打印对于 lsi.print_topics(2) 中的顶部:印花上衣# 我可以打印出文档以及每个文档最可能的主题.lda = ldamodel.LdaModel(语料库,id2word=dictionary,num_topics=50)corpus_lda = lda[语料库]对于 izip(corpus_lda,corpus) 中的 l,t:打印 l,"#",t打印# 但是我无法打印主题,我该怎么做?对于 lda.print_topics(10) 中的顶部:印花上衣

解决方案

经过一番折腾,似乎 ldamodelprint_topics(numoftopics) 有一些错误.所以我的解决方法是使用 print_topic(topicid):

<预><代码>>>>打印 lda.print_topics()没有任何>>>对于范围内的 i (0, lda.num_topics-1):>>>打印 lda.print_topic(i)0.083*res​​ponse + 0.083*interface + 0.083*time + 0.083*human + 0.083*user + 0.083*survey + 0.083*computer + 0.083*eps + 0.083*trees + 0.083*system...

Using gensim I was able to extract topics from a set of documents in LSA but how do I access the topics generated from the LDA models?

When printing the lda.print_topics(10) the code gave the following error because print_topics() return a NoneType:

Traceback (most recent call last):
  File "/home/alvas/workspace/XLINGTOP/xlingtop.py", line 93, in <module>
    for top in lda.print_topics(2):
TypeError: 'NoneType' object is not iterable

The code:

from gensim import corpora, models, similarities
from gensim.models import hdpmodel, ldamodel
from itertools import izip

documents = ["Human machine interface for lab abc computer applications",
              "A survey of user opinion of computer system response time",
              "The EPS user interface management system",
              "System and human system engineering testing of EPS",
              "Relation of user perceived response time to error measurement",
              "The generation of random binary unordered trees",
              "The intersection graph of paths in trees",
              "Graph minors IV Widths of trees and well quasi ordering",
              "Graph minors A survey"]

# remove common words and tokenize
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

# remove words that appear only once
all_tokens = sum(texts, [])
tokens_once = set(word for word in set(all_tokens) if all_tokens.count(word) == 1)
texts = [[word for word in text if word not in tokens_once]
         for text in texts]

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# I can print out the topics for LSA
lsi = models.LsiModel(corpus_tfidf, id2word=dictionary, num_topics=2)
corpus_lsi = lsi[corpus]

for l,t in izip(corpus_lsi,corpus):
  print l,"#",t
print
for top in lsi.print_topics(2):
  print top

# I can print out the documents and which is the most probable topics for each doc.
lda = ldamodel.LdaModel(corpus, id2word=dictionary, num_topics=50)
corpus_lda = lda[corpus]

for l,t in izip(corpus_lda,corpus):
  print l,"#",t
print

# But I am unable to print out the topics, how should i do it?
for top in lda.print_topics(10):
  print top

解决方案

After some messing around, it seems like print_topics(numoftopics) for the ldamodel has some bug. So my workaround is to use print_topic(topicid):

>>> print lda.print_topics()
None
>>> for i in range(0, lda.num_topics-1):
>>>  print lda.print_topic(i)
0.083*response + 0.083*interface + 0.083*time + 0.083*human + 0.083*user + 0.083*survey + 0.083*computer + 0.083*eps + 0.083*trees + 0.083*system
...

这篇关于如何从gensim打印LDA主题模型?Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆