主题建模-为文档分配前2个主题作为类别标签-sklearn Latent Dirichlet分配 [英] Topic modelling - Assign a document with top 2 topics as category label - sklearn Latent Dirichlet Allocation
问题描述
我现在正在通过LDA(潜在狄利克雷分配)主题建模方法来帮助从一组文档中提取主题.据我从下面的链接了解到的,这是一种无监督的学习方法,可以使用提取的主题对每个文档进行分类/标记.
I am now going through LDA(Latent Dirichlet Allocation) Topic modelling method to help in extraction of topics from a set of documents. As from what I have understood from the link below, this is an unsupervised learning approach to categorize / label each of the documents with the extracted topics.
在该链接中给出的示例代码中,定义了一个函数来获取与所标识的每个主题相关的关键词.
In the sample code given in that link, there is a function defined to get the top words associated with each of the topic identified.
sklearn.__version__
出[41]:"0.17"
Out[41]: '0.17'
from sklearn.decomposition import LatentDirichletAllocation
def print_top_words(model, feature_names, n_top_words):
for topic_idx, topic in enumerate(model.components_):
print("Topic #%d:" % topic_idx)
print(" ".join([feature_names[i]
for i in topic.argsort()[:-n_top_words - 1:-1]]))
print()
print("\nTopics in LDA model:")
tf_feature_names = tf_vectorizer.get_feature_names()
print_top_words(lda, tf_feature_names, n_top_words)
我的问题是这个.构建的模型LDA是否有任何组件或矩阵,我们可以从中获取文档-主题关联?
My Question is this. Is there any component or matrix of the built model LDA, from where we can get the document-topic association ?
例如,我需要找到与每个文档相关的前2个主题,作为该文档的文档标签/类别.是否有用于查找文档中主题分布的任何组件,类似于用于查找主题中单词分布的model.components_
.
For example, I need to find top 2 topics associated with each doc as the document label / Category for that Doc. Is there any component to find distribution of topics in a document, similar to the model.components_
for finding words distribution within a topic.
推荐答案
您可以使用LDA类的transform(X)函数计算文档主题关联.
You can compute the document-topic association using the transform(X) function of the LDA class.
在示例代码上,这将是:
On the example code, this would be:
doc_topic_distrib = lda.transform(tf)
使用已拟合的lda和lda,然后将要转换的输入数据转为
with lda the fitted lda, and tf the input data you want to transform
这篇关于主题建模-为文档分配前2个主题作为类别标签-sklearn Latent Dirichlet分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!