主题建模:如何使用拟合的LDA模型预测R中新数据集的新主题? [英] Topic Modeling: How do I use my fitted LDA model to predict new topics for a new dataset in R?

查看:254
本文介绍了主题建模:如何使用拟合的LDA模型预测R中新数据集的新主题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在R中使用"lda"包进行主题建模.我想使用适合新数据集的潜伏Dirichlet分配(LDA)模型来预测新主题(文档中相关单词的集合).在此过程中,我遇到了预测性.distribution()函数.但是该函数将document_sums作为输入参数,这是在拟合新模型后结果的输出.我需要帮助来了解在新数据集上使用现有模型并预测主题. 这是Johnathan Chang为该软件包编写的文档中提供的示例代码: 这是它的代码:

I am using 'lda' package in R for topic modeling. I want to predict new topics(collection of related words in a document) using a fitted Latent Dirichlet Allocation(LDA) model for new dataset. In the process, I came across predictive.distribution() function. But the function takes document_sums as input parameter which is an output of the result after fitting the new model. I need help to understand the use of existing model on new dataset and predict topics. Here is the example code present in the documentation written by Johnathan Chang for the package: Here is the code for it:

#Fit a model
data(cora.documents)
data(cora.vocab)

K <- 10 ## Num clusters

result <- lda.collapsed.gibbs.sampler(cora.documents,K, cora.vocab,25, 0.1, 0.1) 

# Predict new words for the first two documents
predictions <-  predictive.distribution(result$document_sums[,1:2], result$topics, 0.1, 0.1)

# Use top.topic.words to show the top 5 predictions in each document.
top.topic.words(t(predictions), 5)

任何帮助将不胜感激

感谢&问候

Ankit

推荐答案

我不知道如何在R中实现这一目标,但请参阅Wallach等人于2009年发表的文章. al.标题为主题模型的评估方法"的此处.看一下第4节,它提到了三种计算P(z | w)的方法,一种基于重要性采样,另一种称为"Chib风格估计器"和从左至右估计器".

I don't know how you can achieve this in R but please have a look at a 2009 publication by Wallach et. al. titled 'Evaluation Methods for Topic Models' here. Have a look at section 4, it mentions three methods to calculate P(z|w), one based on importance sampling and other two called 'Chib-style estimator' and 'left-to-right estimator'.

Mallet实现了从左到右的估计器方法

Mallet has implementation of left-to-right estimator method

这篇关于主题建模:如何使用拟合的LDA模型预测R中新数据集的新主题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆