使用R(topicmodels)的LDA的不同结果 [英] Different results of LDA using R(topicmodels)

查看:272
本文介绍了使用R(topicmodels)的LDA的不同结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R topicmodels来训练小型语料库的LDA模型,但是我发现,每次重复相同的代码,它都会得到不同的结果(不同的主题和不同的主题词) 我的问题是为什么相同的条件和相同的语料每次都会有不同的结果,我应该怎么做才能使结果稳定? 这是我的代码:

I am using R topicmodels to train an LDA model from a small corpus, but I find that every time I repeat the same code, it has the different results (different topics and different topic terms) My question is why the same condition and same corpus has the different result every time, and what should I do to stabilize the result? Here is my code:

library(tm)
library(topicmodels)
cname<-file.path(".","corpus","train")
docs<-Corpus(DirSource(cname))
toSpace<-content_transformer(function(x,pattern) gsub(pattern,"",x))
docs<-tm_map(docs,toSpace,"/")
docs<-tm_map(docs,toSpace,"@")
docs<-tm_map(docs,toSpace,"#")
docs<-tm_map(docs,toSpace,"\\|")
docs<-tm_map(docs,toSpace,"&")
docs<-tm_map(docs,content_transformer(tolower))
docs<-tm_map(docs,removeNumbers)
docs<-tm_map(docs,removePunctuation)
docs<-tm_map(docs,removeWords,stopwords("english"))
docs<-tm_map(docs,removeWords,c("amp"))
docs<-tm_map(docs,stripWhitespace)
dtm<-DocumentTermMatrix(docs)
dtm_LDA<-LDA(dtm,5)
get_terms(dtm_LDA,10)

我尝试了set.seed,但似乎不起作用.而且我发现了类似的问题 LDA模型每次在同一个语料库上训练时都会生成不同的主题,但这是一个python.

I have try set.seed, but it seems doesn't work. And I find similar questionsLDA model generates different topics every time I train on the same corpus, but it is a python one.

推荐答案

针对那些遇到相同问题的人.您可以尝试通过在LDA函数中指定控件属性,将随机种子的值设置为固定值,如下所示.在此处查找更多信息.

For those who come across same issue. You can try set the value of random seed as fixed by specifying the control attribute in LDA function as below. Find more information here.

lda <- LDA(AssociatedPress[1:20, ], control=list(seed=0), k=2)

这篇关于使用R(topicmodels)的LDA的不同结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆