主题建模-为文档分配前2个主题作为类别标签-sklearn Latent Dirichlet分配 [英] Topic modelling - Assign a document with top 2 topics as category label - sklearn Latent Dirichlet Allocation

查看：437 发布时间：2020/4/30 8:38:34 python python-2.7 scikit-learn lda topic-modeling

本文介绍了主题建模-为文档分配前2个主题作为类别标签-sklearn Latent Dirichlet分配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我现在正在通过LDA(潜在狄利克雷分配)主题建模方法来帮助从一组文档中提取主题.据我从下面的链接了解到的，这是一种无监督的学习方法，可以使用提取的主题对每个文档进行分类/标记.

I am now going through LDA(Latent Dirichlet Allocation) Topic modelling method to help in extraction of topics from a set of documents. As from what I have understood from the link below, this is an unsupervised learning approach to categorize / label each of the documents with the extracted topics.

通过非负矩阵分解和潜在狄利克雷分配进行主题提取

在该链接中给出的示例代码中，定义了一个函数来获取与所标识的每个主题相关的关键词.

In the sample code given in that link, there is a function defined to get the top words associated with each of the topic identified.

sklearn.__version__

出[41]:"0.17"

Out[41]: '0.17'

from sklearn.decomposition import LatentDirichletAllocation 


def print_top_words(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        print("Topic #%d:" % topic_idx)
        print(" ".join([feature_names[i]
                        for i in topic.argsort()[:-n_top_words - 1:-1]]))
    print()

print("\nTopics in LDA model:")
tf_feature_names = tf_vectorizer.get_feature_names()
print_top_words(lda, tf_feature_names, n_top_words)

我的问题是这个.构建的模型LDA是否有任何组件或矩阵，我们可以从中获取文档-主题关联?

My Question is this. Is there any component or matrix of the built model LDA, from where we can get the document-topic association ?

例如，我需要找到与每个文档相关的前2个主题，作为该文档的文档标签/类别.是否有用于查找文档中主题分布的任何组件，类似于用于查找主题中单词分布的model.components_.

For example, I need to find top 2 topics associated with each doc as the document label / Category for that Doc. Is there any component to find distribution of topics in a document, similar to the model.components_ for finding words distribution within a topic.

主题建模-为文档分配前2个主题作为类别标签-sklearn Latent Dirichlet分配 [英] Topic modelling - Assign a document with top 2 topics as category label - sklearn Latent Dirichlet Allocation

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

主题建模-为文档分配前2个主题作为类别标签-sklearn Latent Dirichlet分配 [英] Topic modelling - Assign a document with top 2 topics as category label - sklearn Latent Dirichlet Allocation

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭