主题建模-为文档分配前2个主题作为类别标签-sklearn Latent Dirichlet分配 [英] Topic modelling - Assign a document with top 2 topics as category label - sklearn Latent Dirichlet Allocation

查看:437
本文介绍了主题建模-为文档分配前2个主题作为类别标签-sklearn Latent Dirichlet分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在正在通过LDA(潜在狄利克雷分配)主题建模方法来帮助从一组文档中提取主题.据我从下面的链接了解到的,这是一种无监督的学习方法,可以使用提取的主题对每个文档进行分类/标记.

I am now going through LDA(Latent Dirichlet Allocation) Topic modelling method to help in extraction of topics from a set of documents. As from what I have understood from the link below, this is an unsupervised learning approach to categorize / label each of the documents with the extracted topics.

通过非负矩阵分解和潜在狄利克雷分配进行主题提取

在该链接中给出的示例代码中,定义了一个函数来获取与所标识的每个主题相关的关键词.

In the sample code given in that link, there is a function defined to get the top words associated with each of the topic identified.

sklearn.__version__

出[41]:"0.17"

Out[41]: '0.17'

from sklearn.decomposition import LatentDirichletAllocation 


def print_top_words(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        print("Topic #%d:" % topic_idx)
        print(" ".join([feature_names[i]
                        for i in topic.argsort()[:-n_top_words - 1:-1]]))
    print()

print("\nTopics in LDA model:")
tf_feature_names = tf_vectorizer.get_feature_names()
print_top_words(lda, tf_feature_names, n_top_words)

我的问题是这个.构建的模型LDA是否有任何组件或矩阵,我们可以从中获取文档-主题关联?

My Question is this. Is there any component or matrix of the built model LDA, from where we can get the document-topic association ?

例如,我需要找到与每个文档相关的前2个主题,作为该文档的文档标签/类别.是否有用于查找文档中主题分布的任何组件,类似于用于查找主题中单词分布的model.components_.

For example, I need to find top 2 topics associated with each doc as the document label / Category for that Doc. Is there any component to find distribution of topics in a document, similar to the model.components_ for finding words distribution within a topic.

推荐答案

您可以使用LDA类的transform(X)函数计算文档主题关联.

You can compute the document-topic association using the transform(X) function of the LDA class.

在示例代码上,这将是:

On the example code, this would be:

doc_topic_distrib = lda.transform(tf)

使用已拟合的lda和lda,然后将要转换的输入数据转为

with lda the fitted lda, and tf the input data you want to transform

这篇关于主题建模-为文档分配前2个主题作为类别标签-sklearn Latent Dirichlet分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆