如何在 R 中设置.Seed 进行模拟以在 Windows 操作系统上实现可重复性 [英] How Do I Set.Seed for simulation in R to attain reproducibility on Windows OS

查看:43
本文介绍了如何在 R 中设置.Seed 进行模拟以在 Windows 操作系统上实现可重复性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中使用以下函数进行了模拟:

## 加载包并准备多核进程图书馆(预测)图书馆(未来.申请)计划(多会话)图书馆(并行)图书馆(foreach)库(doParallel)n_cores <-detectCores()cl <- makeCluster(n_cores)registerDoParallel(cores = detectCores())set.seed(1)bootstrap1 <- 函数(n,phi){ts <- arima.sim(n, model = list(ar=p order = c(1, 1, 0)), sd = 1)#ts <- 数字(n)#ts[1] <- 范数(1)#for(i in 2:length(ts))# ts[i] <- 2 * ts[i - 1] + rnorm(1)####################################################### 创建一个块大小的向量t <- length(ts) # 时间序列的长度lb <- seq(n-2)+1 # 块大小为 1 的向量 <<n(即仅介于 1 和 n 之间)#######################################################本节创建矩阵来存储块均值BOOTSTRAP <- 矩阵(nrow = 1,ncol = 长度(lb))列名(引导程序)<-lb#BOOTSTRAP <- 列表(长度(磅))#######################################################本节使用foreach函数做大括号中的细节引导程序 <- foreach(b = 1:length(lb), .combine = 'cbind') %dopar%{l <- lb[b]# 每个实例的块大小m <-天花板(t/l)#块数blk <- split(ts​​, rep(1:m, each=l, length.out = t)) # 将系列分成块###################################################res<-sample(blk, replace=T, 1000) # 重新采样块res.unlist <- unlist(res, use.names = FALSE) # 取消列出 bootstrap 系列train <- head(res.unlist, round(length(res.unlist) - 10)) # 训练集test <- tail(res.unlist, length(res.unlist) - length(train)) # 测试集nfuture <- forecast::forecast(train, model = forecast::auto.arima(train), lambda=0, biasadj=TRUE, h = length(test))$mean # 进行测试集的预测RMSE <- Metrics::rmse(test, nfuture) # RETURN RMSE引导程序[b] <- RMSE}BOOTSTRAPS <- 矩阵(BOOTSTRAP,nrow = 1,ncol = 长度(lb))列名(引导程序)<- lb自举返回(列表(引导程序"=引导程序))}

我使用 for 循环 将其结果打印 3 次.

for (i in 1:3) { set.seed(1)打印(bootstrap1(10,0.5))}

我有以下结果:

## 2 3 4 5 6 7 8 9##[1,] 1.207381 1.447382 1.282099 0.9311434 0.8481634 1.006494 0.9829584 1.205194## 2 3 4 5 6 7 8 9##[1,] 1.404846 1.262756 1.50738 1.188452 0.8981125 1.001651 1.349721 1.579556## 2 3 4 5 6 7 8 9##[1,] 1.265196 1.080703 1.074807 1.430653 0.9166268 1.12537 0.9492137 1.201763

如果我必须多次运行这个,我会得到不同的结果.

我想要设置种子的方式,这样三轮将是不同的,而如果我使用设置的种子运行,我将使用 R.

解决方案

我们可以在set.seed中指定kind.如果我们在循环内执行此操作,它将返回相同的值

for (i in 1:3) {set.seed(1, kind = "L'Ecuyer-CMRG")打印(bootstrap1(10,0.5))}#$引导程序# 2 3 4 5 6 7 8 9#[1,] 4.189426 6.428085 3.672116 3.893026 2.685741 3.821201 3.286509 4.062811#$引导程序# 2 3 4 5 6 7 8 9#[1,] 4.189426 6.428085 3.672116 3.893026 2.685741 3.821201 3.286509 4.062811#$引导程序# 2 3 4 5 6 7 8 9#[1,] 4.189426 6.428085 3.672116 3.893026 2.685741 3.821201 3.286509 4.062811


如果想要在 for 循环中为每次迭代返回不同的值并在后续运行中获得相同的结果,请在循环外指定 set.seed>

1)第一次运行

set.seed(1, kind = "L'Ecuyer-CMRG")for (i in 1:3) {打印(bootstrap1(10,0.5))}#$引导程序# 2 3 4 5 6 7 8 9#[1,] 4.189426 6.428085 3.672116 3.893026 2.685741 3.821201 3.286509 4.062811#$引导程序# 2 3 4 5 6 7 8 9#[1,] 1.476428 1.806258 2.071091 2.09906 2.014298 1.032776 2.573738 1.831142#$引导程序# 2 3 4 5 6 7 8 9#[1,] 2.248546 1.838302 2.345557 1.696614 2.06357 1.502569 1.912556 1.906049

2) 第二次运行

set.seed(1, kind = "L'Ecuyer-CMRG")for (i in 1:3) {打印(bootstrap1(10,0.5))}#$引导程序# 2 3 4 5 6 7 8 9#[1,] 4.189426 6.428085 3.672116 3.893026 2.685741 3.821201 3.286509 4.062811#$引导程序# 2 3 4 5 6 7 8 9#[1,] 1.476428 1.806258 2.071091 2.09906 2.014298 1.032776 2.573738 1.831142#$引导程序# 2 3 4 5 6 7 8 9#[1,] 2.248546 1.838302 2.345557 1.696614 2.06357 1.502569 1.912556 1.906049

根据?set.seed

<块引用>

L'Ecuyer-CMRG":-来自 L'Ecuyer (1999) 的组合多重递归生成器",其中的每个元素都是具有三个整数元素的反馈乘法生成器:因此种子是长度为 6 的(有符号)整数向量.周期约为 2^191.种子的 6 个元素在内部被视为 32 位无符号整数.前三个和后三个都不应该全为零,并且它们分别被限制为小于 4294967087 和 4294944443.这本身并不是特别有趣,但为包并行中使用的多个流提供了基础.

I have a simulation done with the below function in R:

## Load packages and prepare multicore process
library(forecast)
library(future.apply)
plan(multisession)
library(parallel)
library(foreach)
library(doParallel)
n_cores <- detectCores()
cl <- makeCluster(n_cores)
registerDoParallel(cores = detectCores())
set.seed(1)
bootstrap1 <- function(n, phi){
  ts <- arima.sim(n, model = list(ar=phi, order = c(1, 1, 0)), sd = 1)
  #ts <- numeric(n)
  #ts[1] <- rnorm(1)
  #for(i in 2:length(ts))
  #  ts[i] <- 2 * ts[i - 1] + rnorm(1)
  ########################################################
  ## create a vector of block sizes
  t <- length(ts)    # the length of the time series
  lb <- seq(n-2)+1   # vector of block sizes to be 1 < l < n (i.e to be between 1 and n exclusively)
  ########################################################
  ## This section create matrix to store block means
  BOOTSTRAP <- matrix(nrow = 1, ncol = length(lb))
  colnames(BOOTSTRAP) <-lb
  #BOOTSTRAP <- list(length(lb))
  ########################################################
  ## This section use foreach function to do detail in the brace
  BOOTSTRAP <- foreach(b = 1:length(lb), .combine = 'cbind') %dopar%{
    l <- lb[b]# block size at each instance 
    m <- ceiling(t / l)                                 # number of blocks
    blk <- split(ts, rep(1:m, each=l, length.out = t))  # divides the series into blocks
    ######################################################
    res<-sample(blk, replace=T, 1000)        # resamples the blocks
    res.unlist <- unlist(res, use.names = FALSE)   # unlist the bootstrap series
    train <- head(res.unlist, round(length(res.unlist) - 10)) # Train set
    test <- tail(res.unlist, length(res.unlist) - length(train)) # Test set
    nfuture <- forecast::forecast(train, model = forecast::auto.arima(train), lambda=0, biasadj=TRUE, h = length(test))$mean        # makes the `forecast of test set
    RMSE <- Metrics::rmse(test, nfuture)      # RETURN RMSE
    BOOTSTRAP[b] <- RMSE
  }
  BOOTSTRAPS <- matrix(BOOTSTRAP, nrow = 1, ncol = length(lb))
  colnames(BOOTSTRAPS) <- lb
  BOOTSTRAPS
  return(list("BOOTSTRAPS" = BOOTSTRAPS))
}

I use for loop to print its result three times.

for (i in 1:3)  { set.seed(1)
  print(bootstrap1(10, 0.5))
}

I have the below result:

##            2        3        4         5         6        7         8        9
##[1,] 1.207381 1.447382 1.282099 0.9311434 0.8481634 1.006494 0.9829584 1.205194
##            2        3       4        5         6        7        8        9
##[1,] 1.404846 1.262756 1.50738 1.188452 0.8981125 1.001651 1.349721 1.579556
##            2        3        4        5         6       7         8        9
##[1,] 1.265196 1.080703 1.074807 1.430653 0.9166268 1.12537 0.9492137 1.201763

If I have to run this several times I will be getting a different result.

I want the way I can set the seed such that the three-round will be distinct while if I run with the set seed, I will get the same three-distinct result using R.

解决方案

We could specify the kind in set.seed. If we are doing this inside the loop, it will return the same values

for (i in 1:3)  {
    set.seed(1, kind = "L'Ecuyer-CMRG")
   print(bootstrap1(10, 0.5))
 }
#$BOOTSTRAPS
#            2        3        4        5        6        7        8        9
#[1,] 4.189426 6.428085 3.672116 3.893026 2.685741 3.821201 3.286509 4.062811

#$BOOTSTRAPS
#            2        3        4        5        6        7        8        9
#[1,] 4.189426 6.428085 3.672116 3.893026 2.685741 3.821201 3.286509 4.062811

#$BOOTSTRAPS
#            2        3        4        5        6        7        8        9
#[1,] 4.189426 6.428085 3.672116 3.893026 2.685741 3.821201 3.286509 4.062811


If the intention is to return different values for each iteration in for loop and get the same result on subsequent runs, specify the set.seed outside the loop

1) First run

set.seed(1, kind = "L'Ecuyer-CMRG")
for (i in 1:3)  {    
    print(bootstrap1(10, 0.5))
  }
#$BOOTSTRAPS
#            2        3        4        5        6        7        8        9
#[1,] 4.189426 6.428085 3.672116 3.893026 2.685741 3.821201 3.286509 4.062811

#$BOOTSTRAPS
#            2        3        4       5        6        7        8        9
#[1,] 1.476428 1.806258 2.071091 2.09906 2.014298 1.032776 2.573738 1.831142

#$BOOTSTRAPS
#            2        3        4        5       6        7        8        9
#[1,] 2.248546 1.838302 2.345557 1.696614 2.06357 1.502569 1.912556 1.906049

2) Second run

set.seed(1, kind = "L'Ecuyer-CMRG")
for (i in 1:3)  {    
    print(bootstrap1(10, 0.5))
  }
#$BOOTSTRAPS
#            2        3        4        5        6        7        8        9
#[1,] 4.189426 6.428085 3.672116 3.893026 2.685741 3.821201 3.286509 4.062811

#$BOOTSTRAPS
#            2        3        4       5        6        7        8        9
#[1,] 1.476428 1.806258 2.071091 2.09906 2.014298 1.032776 2.573738 1.831142

#$BOOTSTRAPS
#            2        3        4        5       6        7        8        9
#[1,] 2.248546 1.838302 2.345557 1.696614 2.06357 1.502569 1.912556 1.906049

According to ?set.seed

"L'Ecuyer-CMRG": - A ‘combined multiple-recursive generator’ from L'Ecuyer (1999), each element of which is a feedback multiplicative generator with three integer elements: thus the seed is a (signed) integer vector of length 6. The period is around 2^191. The 6 elements of the seed are internally regarded as 32-bit unsigned integers. Neither the first three nor the last three should be all zero, and they are limited to less than 4294967087 and 4294944443 respectively. This is not particularly interesting of itself, but provides the basis for the multiple streams used in package parallel.

这篇关于如何在 R 中设置.Seed 进行模拟以在 Windows 操作系统上实现可重复性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆