如何使用 R 的 topicmodels 包中的 LDA 函数重现精确结果 [英] How to reproduce exact results with LDA function in R's topicmodels package

查看:16
本文介绍了如何使用 R 的 topicmodels 包中的 LDA 函数重现精确结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直无法从 topicmodels 的 LDA 函数中创建可重现的结果.以他们的文档为例:

I've been unable to create reproducible results from topicmodels' LDA function. To take an example from their documentation:

library(topicmodels)
set.seed(0)
lda1 <- LDA(AssociatedPress[1:20, ], control=list(seed=0), k=2)
set.seed(0)
lda2 <- LDA(AssociatedPress[1:20, ], control=list(seed=0), k=2)
identical(lda1, lda2)
# [1] FALSE

如何从对 LDA 的两次单独调用中获得相同的结果?

How can I get identical results from two separate calls to LDA?

顺便说一句(以防软件包作者在这里),我发现 control=list(seed=0) 片段很不幸且不必要.在幕后,有一行表示 if (missing(seed)) seed <- as.integer(Sys.time()).这不会使过程更可靠地随机,它只会撤消指定的种子.我错过了什么吗?

As an aside (in case the package authors are on here), I find the control=list(seed=0) snippet unfortunate and unnecessary. Behind the scenes, there's a line for if (missing(seed)) seed <- as.integer(Sys.time()). This doesn't make the process more reliably random, it only undoes a specified seed. Am I missing something?

更新:正如@hrbrmstr 在下面发现的那样,将种子作为控件传递会产生有效相同的对象,唯一的区别是临时本地文件位置.所以这个问题更像是一个误解(尽管如果函数尊重 set.seed() 似乎仍然会更清楚).

UPDATE: As @hrbrmstr discovered below, passing a seed as a control results in effectively identical objects, with the only difference being a temp local file location. So this question is more of a misunderstanding (though still seems like it would be clearer if the function respected set.seed()).

推荐答案

并不是真正的答案",但没有其他方法可以发布代码片段 :-)

Not really an "answer" but there's no other way to post code snippets :-)

我试了一下:

library(topicmodels)

data(AssociatedPress)

lda1 <- LDA(AssociatedPress[1:20, ], control=list(seed=0), k=2)
lda2 <- LDA(AssociatedPress[1:20, ], control=list(seed=0), k=2)

identical(lda1, lda2)
[1] FALSE

all.equal(lda1, lda2)
[1] "Attributes: < Component 5: Attributes: < Component 10: 1 string mismatch > >"

a1 <- posterior(lda1, AssociatedPress)
a2 <- posterior(lda2, AssociatedPress)

identical(a1, a2)
[1] TRUE

all.equal(a1, a2)
[1] TRUE

all.equal(lda1@alpha,lda2@alpha)
[1] TRUE
all.equal(lda1@call,lda2@call)
[1] TRUE
all.equal(lda1@Dim,lda2@Dim)
[1] TRUE
all.equal(lda1@control,lda2@control)
[1] "Attributes: < Component 10: 1 string mismatch >"
all.equal(lda1@k,lda2@k)
[1] TRUE
all.equal(lda1@terms,lda2@terms)
[1] TRUE
all.equal(lda1@documents,lda2@documents)
[1] TRUE
all.equal(lda1@beta,lda2@beta)
[1] TRUE
all.equal(lda1@gamma,lda2@gamma)
[1] TRUE
all.equal(lda1@wordassignments,lda2@wordassignments)
[1] TRUE
all.equal(lda1@loglikelihood,lda2@loglikelihood)
[1] TRUE
all.equal(lda1@iter,lda2@iter)
[1] TRUE
all.equal(lda1@logLiks,lda2@logLiks)
[1] TRUE
all.equal(lda1@n,lda2@n)
[1] TRUE

identical(lda1@alpha,lda2@alpha)
[1] TRUE
identical(lda1@call,lda2@call)
[1] TRUE
identical(lda1@Dim,lda2@Dim)
[1] TRUE
identical(lda1@control,lda2@control)
[1] FALSE
identical(lda1@k,lda2@k)
[1] TRUE
identical(lda1@terms,lda2@terms)
[1] TRUE
identical(lda1@documents,lda2@documents)
[1] TRUE
identical(lda1@beta,lda2@beta)
[1] TRUE
identical(lda1@gamma,lda2@gamma)
[1] TRUE
identical(lda1@wordassignments,lda2@wordassignments)
[1] TRUE
identical(lda1@loglikelihood,lda2@loglikelihood)
[1] TRUE
identical(lda1@iter,lda2@iter)
[1] TRUE
identical(lda1@logLiks,lda2@logLiks)
[1] TRUE
identical(lda1@n,lda2@n)
[1] TRUE

不平等"@control 重要吗?

这篇关于如何使用 R 的 topicmodels 包中的 LDA 函数重现精确结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆