如何在purrr :: pmap中派生/并行化进程 [英] How to fork/parallelize process in purrr::pmap
问题描述
我有以下代码使用 purr :: pmap >
library(tidyverse)
set.seed(1)
params <- tribble(
~mean, ~sd, ~n,
5, 1, 1,
10, 5, 3,
-3, 10, 5
)
params %>%
pmap(rnorm)
#> [[1]]
#> [1] 4.373546
#>
#> [[2]]
#> [1] 10.918217 5.821857 17.976404
#>
#> [[3]]
#> [1] 0.2950777 -11.2046838 1.8742905 4.3832471 2.7578135
如何并行化(分叉)以上过程,使其运行更快并产生相同的结果?
在这里,我将rnorm
用于说明目的,实际上,我具有执行繁重工作的功能.它需要并行化.
我愿意接受非Purrr(非tidyverse)解决方案,只要在给定rnorm
函数和params
作为输入的情况下产生相同的结果即可.
简而言之:允许使用与pmap()
类似的语法的并行pmap()
"看起来像是lift(mcmapply)()
或lift(clusterMap)()
. /p>
如果您不在Windows上,则可以:
library(parallel)
# forking
set.seed(1, "L'Ecuyer")
params %>%
lift(mcmapply, mc.cores = detectCores() - 1)(FUN = rnorm)
# [[1]]
# [1] 4.514604
#
# [[2]]
# [1] 0.7022156 0.8734875 5.0250478
#
# [[3]]
# [1] 8.7704060 11.7217925 -12.8776289 -10.7466152 0.5177089
编辑
这是一个清洁"选项,应该更像是使用pmap
:
nc <- max(parallel::detectCores() - 1, 1L)
par_pmap <- function(.l, .f, ..., mc.cores = getOption("mc.cores", 2L)) {
do.call(
parallel::mcmapply,
c(.l, list(FUN = .f, MoreArgs = list(...), SIMPLIFY = FALSE, mc.cores = mc.cores))
)
}
f <- function(n, mean, sd, ...) rnorm(n, mean, sd)
params %>%
par_pmap(f, some_other_arg_to_f = "foo", mc.cores = nc)
如果您使用的是Windows(或任何其他操作系统),则可以:
library(parallel)
# (Parallel SOCKet cluster)
cl <- makeCluster(detectCores() - 1)
clusterSetRNGStream(cl, 1)
params %>%
lift(clusterMap, cl = cl)(fun = rnorm)
# [[1]]
# [1] 5.460811
#
# [[2]]
# [1] 7.573021 6.870994 5.633097
#
# [[3]]
# [1] -21.595569 -21.253025 -12.949904 -4.817278 -7.650049
stopCluster(cl)
如果您更倾向于使用foreach
,则可以:
library(doParallel)
# (fork by default on my Linux machine, should PSOCK by default on Windows)
registerDoParallel(cores = detectCores() - 1)
set.seed(1, "L'Ecuyer")
lift(foreach)(params) %dopar%
rnorm(n, mean, sd)
# [[1]]
# [1] 4.514604
#
# [[2]]
# [1] 0.7022156 0.8734875 5.0250478
#
# [[3]]
# [1] 8.7704060 11.7217925 -12.8776289 -10.7466152 0.5177089
stopImplicitCluster()
I have the following code that does serial processing with purr::pmap
library(tidyverse)
set.seed(1)
params <- tribble(
~mean, ~sd, ~n,
5, 1, 1,
10, 5, 3,
-3, 10, 5
)
params %>%
pmap(rnorm)
#> [[1]]
#> [1] 4.373546
#>
#> [[2]]
#> [1] 10.918217 5.821857 17.976404
#>
#> [[3]]
#> [1] 0.2950777 -11.2046838 1.8742905 4.3832471 2.7578135
How can I parallelize (fork) the process above so that it runs faster and produces identical result?
Here, I use rnorm
for illustration purpose, in reality I have a function that does heavy duty work. It needs parallelizing.
I'm open to non-purrr (non-tidyverse) solution, as long as it produces identical result given the rnorm
function and params
as input.
In short: a "parallel pmap()
", allowing a similar syntax to pmap()
, could look like: lift(mcmapply)()
or lift(clusterMap)()
.
If you're not on Windows, you could:
library(parallel)
# forking
set.seed(1, "L'Ecuyer")
params %>%
lift(mcmapply, mc.cores = detectCores() - 1)(FUN = rnorm)
# [[1]]
# [1] 4.514604
#
# [[2]]
# [1] 0.7022156 0.8734875 5.0250478
#
# [[3]]
# [1] 8.7704060 11.7217925 -12.8776289 -10.7466152 0.5177089
Edit
Here is a "cleaner" option, that should feel more like using pmap
:
nc <- max(parallel::detectCores() - 1, 1L)
par_pmap <- function(.l, .f, ..., mc.cores = getOption("mc.cores", 2L)) {
do.call(
parallel::mcmapply,
c(.l, list(FUN = .f, MoreArgs = list(...), SIMPLIFY = FALSE, mc.cores = mc.cores))
)
}
f <- function(n, mean, sd, ...) rnorm(n, mean, sd)
params %>%
par_pmap(f, some_other_arg_to_f = "foo", mc.cores = nc)
If you're on Windows (or any other OS), you could:
library(parallel)
# (Parallel SOCKet cluster)
cl <- makeCluster(detectCores() - 1)
clusterSetRNGStream(cl, 1)
params %>%
lift(clusterMap, cl = cl)(fun = rnorm)
# [[1]]
# [1] 5.460811
#
# [[2]]
# [1] 7.573021 6.870994 5.633097
#
# [[3]]
# [1] -21.595569 -21.253025 -12.949904 -4.817278 -7.650049
stopCluster(cl)
In case you're more inclined to use foreach
, you could:
library(doParallel)
# (fork by default on my Linux machine, should PSOCK by default on Windows)
registerDoParallel(cores = detectCores() - 1)
set.seed(1, "L'Ecuyer")
lift(foreach)(params) %dopar%
rnorm(n, mean, sd)
# [[1]]
# [1] 4.514604
#
# [[2]]
# [1] 0.7022156 0.8734875 5.0250478
#
# [[3]]
# [1] 8.7704060 11.7217925 -12.8776289 -10.7466152 0.5177089
stopImplicitCluster()
这篇关于如何在purrr :: pmap中派生/并行化进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!