使设置随机种子的函数独立 [英] Making functions that set the random seed independent

查看:176
本文介绍了使设置随机种子的函数独立的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有时我想编写一个随机函数,该函数始终为特定输入返回相同的输出.我一直通过在函数顶部设置随机种子然后继续进行来实现这一点.考虑以这种方式定义的两个函数:

Sometimes I want to write a randomized function that always returns the same output for a particular input. I've always implemented this by setting the random seed at the top of the function and then proceeding. Consider two functions defined in this way:

sample.12 <- function(size) {
  set.seed(144)
  sample(1:2, size, replace=TRUE)
}
rand.prod <- function(x) {
  set.seed(144)
  runif(length(x)) * x
}

sample.12返回从集合{1, 2}中随机采样的指定大小的向量,并且rand.prod将指定向量的每个元素乘以从[0, 1]均匀选择的随机值.通常,我希望x <- sample.12(10000) ; rand.prod(x)具有阶梯"分布,在[0, 1]范围内pdf 3/4,在[1, 2]范围内1/4,但是由于我不幸地选择了上面相同的随机种子,我看到了结果不同:

sample.12 returns a vector of the specified size randomly sampled from the set {1, 2} and rand.prod multiplies each element of a specified vector by a random value uniformly selected from [0, 1]. Normally I would expect x <- sample.12(10000) ; rand.prod(x) to have a "step" distribution with pdf 3/4 in the range [0, 1] and 1/4 in the range [1, 2], but due to my unfortunate choice of identical random seeds above I see a different result:

x <- sample.12(10000)
hist(rand.prod(x))

在这种情况下,我可以通过将函数之一中的随机种子更改为其他某个值来解决此问题.例如,使用rand.prod中的set.seed(10000),我得到了预期的分布:

I can fix this issue in this case by changing the random seed in one of the functions to some other value. For instance, with set.seed(10000) in rand.prod I get the expected distribution:

以前在SO 中,使用不同种子的解决方案已被接受为生成独立随机数流的最佳方法.但是,我发现解决方案并不令人满意,因为具有不同种子的流可能彼此相关(甚至彼此之间都高度相关);实际上,它们甚至可以根据?set.seed产生相同的流:

Previously on SO this solution of using different seeds has been accepted as the best approach to generate independent random number streams. However, I find the solution to be unsatisfying because streams with different seeds could be related to one another (possibly even highly related to one another); in fact, they might even yield identical streams according to ?set.seed:

尽管有任何例外情况都极为罕见,但不能保证不同的种子值将以不同的方式为RNG播种.

There is no guarantee that different values of seed will seed the RNG differently, although any exceptions would be extremely rare.

有没有一种方法可以在R中实现一对随机函数,

Is there a way to implement a pair of randomized functions in R that:

  1. 对于特定输入总是返回相同的输出,并且
  2. 不仅仅是使用不同的随机种子来增强其随机性源之间的独立性?

推荐答案

我已经对此进行了深入研究,看来rlecuyer软件包提供了独立的随机流:

I've dug into this some more and it looks like the rlecuyer package provides independent random streams:

使用L'Ecuyer等人(2002年)开发的具有多个独立流的随机数生成器的C实现提供接口.该软件包的主要目的是允许在并行R应用程序中使用此随机数生成器.

Provides an interface to the C implementation of the random number generator with multiple independent streams developed by L'Ecuyer et al (2002). The main purpose of this package is to enable the use of this random number generator in parallel R applications.

第一步是独立流的全局初始化:

The first step is global initialization of the independent streams:

library(rlecuyer)
.lec.CreateStream(c("stream.12", "stream.prod"))

然后需要修改每个函数,以将适当的流重置为其开始状态(.lec.RestartStartStream),将R随机数生成器设置为适当的流(.lec.CurrentStream),然后再将R随机数生成器设置回到函数被调用之前的状态(.lec.CurrentStreamEnd).

Then each function needs to be modified to reset the appropriate stream to its beginning state (.lec.RestartStartStream), set the R random number generator to the appropriate stream (.lec.CurrentStream), and afterward set the R random number generator back to its state before the function was called (.lec.CurrentStreamEnd).

sample.12 <- function(size) {
  .lec.ResetStartStream("stream.12")
  .lec.CurrentStream("stream.12")
  x <- sample(1:2, size, replace=TRUE)
  .lec.CurrentStreamEnd()
  x
}
rand.prod <- function(x) {
  .lec.ResetStartStream("stream.prod")
  .lec.CurrentStream("stream.prod")
  y <- runif(length(x)) * x
  .lec.CurrentStreamEnd()
  y
}

这满足在给定相同输入的情况下始终返回相同输出"的要求:

This satisfies the "always returns the same output given the same input" requirement:

all.equal(rand.prod(sample.12(10000)), rand.prod(sample.12(10000)))
# [1] TRUE

在我们的示例中,流也似乎独立运行:

The streams also appears to operate independently in our example:

x <- sample.12(10000)
hist(rand.prod(x))

请注意,这不会在脚本的各个运行过程中提供一致的值,因为每次调用.lec.CreateStream都会给出不同的初始状态.为了解决这个问题,我们可以记下每个流的初始状态:

Note that this would not give consistent values across runs of our script because each call to .lec.CreateStream would give a different initial state. To address this, we could note the initial state for each stream:

.lec.GetState("stream.12")
# [1] 3161578179 1307260052 2724279262 1101690876 1009565594  836476762
.lec.GetState("stream.prod")
# [1]  596094074 2279636413 3050913596 1739649456 2368706608 3058697049

然后我们可以将脚本开头的流初始化更改为:

We can then change the stream initialization at the beginning of the script to:

library(rlecuyer)
.lec.CreateStream(c("stream.12", "stream.prod"))
.lec.SetSeed("stream.12", c(3161578179, 1307260052, 2724279262, 1101690876, 1009565594, 836476762))
.lec.SetSeed("stream.prod", c(596094074, 2279636413, 3050913596, 1739649456, 2368706608, 3058697049))

现在对sample.12rand.prod的调用将在对脚本的所有调用中匹配.

Now calls to sample.12 and rand.prod will match across calls to the script.

这篇关于使设置随机种子的函数独立的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆