在R中设置种子的并行处理 [英] Parallel Processing for Setting Seed in R

查看:97
本文介绍了在R中设置种子的并行处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 R 代码,当我使用 arima.sim()函数模拟时,可以帮助我了解什么 seed .> ARIMA(1,0,0),当 auto.arima()时,它将实际上模拟 order 1,0,0 ARIMA 使用code>函数进行检查.

I Have an R code that helps me to know at what seed when I use arima.sim() function to simulate ARIMA(1, 0, 0) it will actually simulate ARIMA of order 1, 0, 0 when auto.arima() function is employed for a check.

MWE

library(forecast)
SEED_vector <- 1:10
arima_order_results <- data.frame()
flag <- TRUE
i <- 1
seed_out <- c()
while(flag){ 

  set.seed(SEED_vector[i])
  ar1 <- arima.sim(n = 20, model=list(ar=0.8, order = c(1, 0, 0)), sd = 1)
  ar2 <- auto.arima(ar1, ic = "aicc")
  if(all(arimaorder(ar2)==c(1,0,0))) {

    #print(arima_order_results)
    print(paste0('arimaorder', SEED_vector[i], ' ' , 
                 paste(arimaorder(ar2), collapse=" ")))
    seed_out <- c(seed_out, SEED_vector[i])

  }

  arima_order = arimaorder(ar2)
  arima_order = t(as.data.frame(arima_order))


  arima_order_results = rbind(arima_order_results,arima_order)

  i <- i+1
  if(i == length(SEED_vector)) {

    flag <- FALSE
  }

}

我对跑步时会设置什么样的种子很感兴趣

I am interested in what seed will I set such that when I run

set.seed(seed_out)
ar1 <- arima.sim(n = 20, model=list(ar=0.8, order = c(1, 0, 0)), sd = 1)
auto.arima(ar1, ic = "aicc")

它将给我(1,0,0)的 arimaorder .在我的 MWE 中,种子是 2 3`.

it will give me arimaorder of (1, 0, 0). In my MWEthe seeds are2and3`.

我想要的

我想要在并行处理中使用我的 MWE ,因为我实际上正在运行1到100,000的种子,这需要3个小时.

I want this my MWE in parallel processing because I am actually running for seeds of 1 to 100,000 and it is taking 3 hours.

我正在Windows上运行 R

I am running R on windows

推荐答案

您可以设置一个 FUN 部分,以与 parallel :: parSapply 并行化.我相信 print 不会那么容易地工作(类似于进度条之类的东西),所以我省去了. FUN() ar2 的有序顺序与种子连接在一起,因此 parSapply 的结果将是一个很好的矩阵 res ,之后您可以在其中检查Arima顺序和 seed .

You could set up a FUNction to parallelize with parallel::parSapply. I believe the printing wouldn't work so easily (similar to progress bars and such stuff) so I leave it out. FUN() concatenates the arima order of ar2 with the seed, thus the result of parSapply will be a nice matrix res, where you may check arima order and seed afterwards.

FUN <- function(i) {
  set.seed(i)
  ar1 <- arima.sim(n=20, model=list(ar=0.8, order=c(1, 0, 0)), sd=1)
  ar2 <- auto.arima(ar1, ic="aicc")
  c(arimaorder(ar2), seed=i)
}

要并行化,请设置一个种子向量,您将在该向量上使用 parSapply 进行循环."FUN" "forecast" 包需要导出到集群.

To parallelize, set up a seed vector over which you'll loop with parSapply. "FUN" and the "forecast" package need to be exported to the clusters.

R <- 1e2  ## this would be your 1e5
seedv <- 1:R

library(parallel)
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, c("FUN"), envir=environment())
clusterEvalQ(cl, suppressPackageStartupMessages(library(forecast)))

res <- parSapply(cl, seedv, "FUN")

stopCluster(cl)

在结果矩阵 res 中,

res
#      [,1] [,2] [,3] [,4] [,5] [,6] 
# p       2    1    1    0    2  ...
# d       0    0    0    1    0  ...
# q       0    0    0    0    0  ...
# seed    1    2    3    4    5  ...

您可以查找其Arima顺序为 c(1,0,0)的哪个"seed" .

you may look-up for which "seed" the arima order is c(1, 0, 0).

res["seed", which(apply(res, 2, function(x) all(x[1:3] == c(1, 0, 0))))]

# [1]  2  3 11 16 17 23 24 25 28 30 33 34 42 43 45 50 51 54 59 60 63 64 66 67
# [25] 71 72 77 79 84 91 96 97

我在机器上检查了 seedv 长度为1e3的情况,预计投影时间为1e5的执行时间为<30分钟.

I checked with seedv length 1e3 with my machine and would expect an execution time of <30 min for the projected length of 1e5.

seedv <- 1:1e3
system.time(parSapply(cl, seedv, "FUN"))
# user  system elapsed 
# 0.00    0.00   17.05

这篇关于在R中设置种子的并行处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆