种子设置:为什么输入不变后输出不同 [英] Seed setting: why is the output different after no change in input

查看:59
本文介绍了种子设置:为什么输入不变后输出不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

设置种子可确保可重复性,并且在模拟建模中很重要.考虑一个简单的模型 f(),其中包含两个感兴趣的变量 y1y2.这些变量的输出由随机过程 (rbinom()) 和参数 x1x2 决定.两个感兴趣的变量的输出是相互独立的.

Setting a seed ensures reproducibility and is important in simulation modelling. Consider a simple model f() with two variables y1 and y2 of interest. The outputs of these variables are determined by a random process (rbinom()) and the parameters x1 and x2. The outputs of the two variables of interest are independent of each other.

现在假设我们要比较各个参数发生变化后变量输出的变化与发生变化之前的场景(即敏感性分析).如果所有其他参数都没有改变并且设置了相同的种子,那么未受影响的变量的输出是否应该保持与默认模拟中的输出相同,因为该变量独立于其他变量?

Now say we want to compare the change in the output of a variable after a change in the respective parameter has occurred with a scenario before the change was made (i.e. sensitivity analysis). If all other parameters have not been changed and the same seed was set, shouldn't the output of the unaffected variable remain the same as it is in the default simulation since this variable is independent of the other?

简而言之,为什么由参数 x2 确定的变量 y2 的以下输出会在 x1 发生变化后发生变化,尽管种子恒定正在设置?可以忽略 y2 的输出而只关注 y1,但在更大的模拟中,每个变量都是总成本的成本组成部分,未受影响变量的变化在进行个别参数更改后测试模型的整体灵敏度时可能会出现问题.

In short, why is the below output of variable y2 determined by parameter x2 changing after only a change in x1 occurs despite constant seed being set? One could just ignore the output of y2 and focus only on y1, but in a larger simulation where each variable is a cost component of the total cost the change in an unaffected variable may become problematic when testing the overall sensitivity of a model after individual parameter changes have been made.

#~ parameters and model

x1 <- 0.0
x2 <- 0.5
n  <- 10
ts <- 5

f <- function(){
  out <- data.frame(step = rep(0, n),
                    space = 1:n,
                    id = 1:n,
                    y1 = rep(1, n),
                    y2 = rep(0, n))
  
  l.out <- vector(mode = "list", length = n)
  
  for(i in 1:ts){
    out$step <- i
    out$y1[out$y1 == 0] <- 1
    out$id[out$y2 == 1]  <- seq_along(which(out$y2 == 1)) + n
    out$y2[out$y2 == 1] <- 0
    
    out$y1 <- rbinom(nrow(out), 1, 1-x1)
    out$y2 <- rbinom(nrow(out), 1, x2)
    
    n  <- max(out$id)
    l.out[[i]] <- out
  }
do.call(rbind, l.out)
}

#~ Simulation 1 (default)
set.seed(1)
run1 <- f()
set.seed(1)
run2 <- f()
run1 == run2 #~ all observations true as expected

#~ Simulation 2
#~ change in x1 parameter affecting only variable y1
x1 <- 0.25
set.seed(1)
run3 <- f()
set.seed(1)
run4 <- f()
run3 == run4 #~ all observations true as expected

#~ compare variables after change in x1 has occured
run1$y1 == run3$y1  #~ observations differ as expected
run1$y2 == run3$y2  #~ observations differ - why?

推荐答案

很好的问题.这种行为的原因是当您在 rbinom 中设置 p = 0p = 1 时,底层 C 函数意识到它不会需要使用随机数生成器进行采样.种子只在随机数生成器被调用时改变,所以如果 p 是任何严格介于 0 和 1 之间的数字,种子会改变,但如果 p 是 0 或 1 它不会.你可以看到这是源代码.

Great question. The reason for this behaviour is that when you set p = 0 or p = 1 in rbinom, the underlying C function realises it doesn't need to sample using the random number generator. The seed only changes when the random number generator is called, so if p is any number strictly between 0 and 1, the seed will change, but if p is 0 or 1 it won't. You can see this is the source code.

在正常情况下,当 p 大于零或小于一时,您的设置应该可以正常工作:

Under normal circumstances when p is more than zero or less than one, your set-up should work fine:

set.seed(1)
x1 <- rbinom(5, 1, 0.4)
y1 <- rbinom(5, 1, 0.5)

set.seed(1)
x2 <- rbinom(5, 1, 0.1)
y2 <- rbinom(5, 1, 0.5)

all(y1 == y2)
#> [1] TRUE

但是如果将p设置为1或0,结果就会不同:

But if you set p to 1 or 0, the results will be different:

set.seed(1)
x1 <- rbinom(5, 1, 0.4)
y1 <- rbinom(5, 1, 0.5)

set.seed(1)
x2 <- rbinom(5, 1, 1)
y2 <- rbinom(5, 1, 0.5)

all(y1 == y2)
#> [1] FALSE

为了证明这是正确的,如果我们第一次将 p 设置为 1 而 p 设置为 0,我们应该得到 y1 == y2第二次:

To show this is correct, we should get y1 == y2 if we set p to 1 the first time and p to 0 the second time:

set.seed(1)
x1 <- rbinom(5, 1, 0)
y1 <- rbinom(5, 1, 0.5)

set.seed(1)
x2 <- rbinom(5, 1, 1)
y2 <- rbinom(5, 1, 0.5)

all(y1 == y2)
#> [1] TRUE

这篇关于种子设置:为什么输入不变后输出不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆