在 R 中运行并行计算时如何在工作人员上设置 .libPaths(检查点) [英] How to set .libPaths (checkpoint) on workers when running parallel computation in R
问题描述
我使用 checkpoint 包进行可重现的数据分析.有些计算需要很长时间来计算,所以我想并行运行它们.当并行运行但检查点未在工作程序上设置时,我收到一条错误消息没有名为 xy 的包"(因为它没有安装在我的默认库目录中).>
我如何确保每个工作人员都使用检查点文件夹中的包版本?我试图在 foreach 代码中设置 .libPaths 但这似乎不起作用.我还希望在全局范围内设置一次检查点/libPaths,而不是在每个 foreach 调用中.
另一种选择是更改 .Rprofile 文件,但我不想这样做.
checkpoint::checkpoint("2018-06-01")图书馆(foreach)图书馆(doFuture)图书馆(未来)doFuture::registerDoFuture()未来::计划(多会话")l <- .libPaths()# 并行运行的代码当然没有多大意义,但我想保持简单.res <- foreach::foreach(x = 独特的(鸢尾$物种),lib.path = l)%dopar%{.libPaths(lib.path)stringr::str_c(x, "_")}
<块引用>
{ 中的错误:任务 2 失败 - 没有名为 'stringr' 的包"
future 包的作者在这里.
将主R进程的库路径作为全局变量libs
传递,并使用.libPaths(libs)
为每个worker设置应该就足够了;>
## 使用 2018-07-24 的 CRAN 检查点来获取未来 (>= 1.9.0) [1],## 否则下面的标准输出将不会被中继回主## R 进程,但设置 .libPaths() 也适用于较旧的## 版本的未来包.## [1] https://cran.microsoft.com/snapshot/2018-07-24/web/packages/futurecheckpoint::checkpoint("2018-07-24")stopifnot(packageVersion("future") >="1.9.0")库 <- .libPaths()打印(库)### [1] "/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1"### [2] "/home/hb/.checkpoint/R-3.5.1"### [3] "/usr/lib/R/library"图书馆(foreach)doFuture::registerDoFuture()未来::计划(多会话")res <- foreach::foreach(x = unique(iris$Species)) %dopar% {## 使用与主 R 会话相同的库路径.libPaths(库)cat(sprintf("worker 使用的库路径 (PID %d):
", Sys.getpid()))cat(sprintf(" - %s
", sQuote(.libPaths())))stringr::str_c(x, "_")}### - '/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1'### - '/home/hb/.checkpoint/R-3.5.1'### - ‘/usr/lib/R/library’### worker 使用的库路径(PID 9394):### - '/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1'### - '/home/hb/.checkpoint/R-3.5.1'### - ‘/usr/lib/R/library’### worker 使用的库路径(PID 9412):### - '/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1'### - '/home/hb/.checkpoint/R-3.5.1'### - ‘/usr/lib/R/library’字符串(资源)### 3 个列表### $ : chr "setosa_"### $ : chr "versicolor_"### $ : chr "virginica_"
仅供参考,它在未来的路线图上使传递库路径更容易) 给工人.
我的详细信息:
>会话信息()R 版本 3.5.1 (2018-07-02)平台:x86_64-pc-linux-gnu(64 位)运行于:Ubuntu 18.04.1 LTS矩阵产品:默认BLAS:/usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1LAPACK:/usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1语言环境:[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C附带的基础包:[1] stats graphics grDevices utils datasets 方法基础其他附加包:[1] foreach_1.4.4通过命名空间加载(而不是附加):[1] drat_0.1.4 compiler_3.5.1 BiocManager_1.30.2 parallel_3.5.1 tools_3.5.1 listenv_0.7.0 doFuture_0.6.0[8] codetools_0.2-15 iterators_1.0.10 digest_0.6.15 globals_0.12.1 checkpoint_0.4.5 future_1.9.0
I use the checkpoint package for reproducible data analysis. Some of the computations take a long time to compute, so I want to run those in parallel. When run in parallel however the checkpoint is not set on the workers, so I get an error message "there is no package called xy" (because it is not installed in my default library directory).
How can I make sure, that each worker uses the package versions in the checkpoint folder? I tried to set .libPaths in the foreach code but this does not seem to work. I would also prefer to set the checkpoint/libPaths once globally and not in every foreach call.
Another option could be to change the .Rprofile file, but I do not want to do this.
checkpoint::checkpoint("2018-06-01")
library(foreach)
library(doFuture)
library(future)
doFuture::registerDoFuture()
future::plan("multisession")
l <- .libPaths()
# Code to run in parallel does not make much sense of course but I wanted to keep it simple.
res <- foreach::foreach(
x = unique(iris$Species),
lib.path = l
) %dopar% {
.libPaths(lib.path)
stringr::str_c(x, "_")
}
Error in { : task 2 failed - "there is no package called 'stringr'"
Author of the future package here.
Passing the the library path of the master R process as a global variable libs
and set it for each worker using .libPaths(libs)
should be enough;
## Use CRAN checkpoint from 2018-07-24 to get future (>= 1.9.0) [1],
## otherwise the below stdout won't be relayed back to the master
## R process, but settings .libPaths() does also work in older
## versions of the future package.
## [1] https://cran.microsoft.com/snapshot/2018-07-24/web/packages/future
checkpoint::checkpoint("2018-07-24")
stopifnot(packageVersion("future") >= "1.9.0")
libs <- .libPaths()
print(libs)
### [1] "/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1"
### [2] "/home/hb/.checkpoint/R-3.5.1"
### [3] "/usr/lib/R/library"
library(foreach)
doFuture::registerDoFuture()
future::plan("multisession")
res <- foreach::foreach(x = unique(iris$Species)) %dopar% {
## Use the same library paths as the master R session
.libPaths(libs)
cat(sprintf("Library paths used by worker (PID %d):
", Sys.getpid()))
cat(sprintf(" - %s
", sQuote(.libPaths())))
stringr::str_c(x, "_")
}
### - ‘/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1’
### - ‘/home/hb/.checkpoint/R-3.5.1’
### - ‘/usr/lib/R/library’
### Library paths used by worker (PID 9394):
### - ‘/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1’
### - ‘/home/hb/.checkpoint/R-3.5.1’
### - ‘/usr/lib/R/library’
### Library paths used by worker (PID 9412):
### - ‘/home/hb/.checkpoint/2018-07-24/lib/x86_64-pc-linux-gnu/3.5.1’
### - ‘/home/hb/.checkpoint/R-3.5.1’
### - ‘/usr/lib/R/library’
str(res)
### List of 3
### $ : chr "setosa_"
### $ : chr "versicolor_"
### $ : chr "virginica_"
FYI, it is on future's roadmap to make it easier to pass down the library path(s) to workers.
My details:
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] foreach_1.4.4
loaded via a namespace (and not attached):
[1] drat_0.1.4 compiler_3.5.1 BiocManager_1.30.2 parallel_3.5.1 tools_3.5.1 listenv_0.7.0 doFuture_0.6.0
[8] codetools_0.2-15 iterators_1.0.10 digest_0.6.15 globals_0.12.1 checkpoint_0.4.5 future_1.9.0
这篇关于在 R 中运行并行计算时如何在工作人员上设置 .libPaths(检查点)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!