为什么不在 RStudio 中打印并行作业? [英] Why don't parallel jobs print in RStudio?

查看:94
本文介绍了为什么不在 RStudio 中打印并行作业?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么与 mclapply 并行化的脚本会在集群上打印,而不会在 RStudio 中打印?只是出于好奇而询问.

Why do scripts parallelized with mclapply print on a cluster but not in RStudio? Just asking out of curiosity.

mclapply(1:10, function(x) {
  print("Hello!")
  return(TRUE)
}, mc.cores = 2)
# Hello prints in slurm but not RStudio

推荐答案

'parallel' 包中的任何函数都不能保证正确显示发送到标准输出 (stdout) 的输出或工人的标准错误 (stderr).这适用于所有类型的并行化方法,例如分叉处理(mclapply())或 PSOCK 集群(parLapply()).这样做的原因是因为它从未设计为以一致的方式中继输出.

None of the functions in the 'parallel' package guarantee proper displaying of output sent to the standard output (stdout) or the standard error (stderr) on workers. This is true for all types of parallelization approaches, e.g. forked processing (mclapply()), or PSOCK clusters (parLapply()). The reason for this is because it was never designed to relay output in a consistent manner.

一个很好的测试是看看您是否可以通过 capture.output() 捕获输出.例如,我得到:

A good test is to see if you can capture the output via capture.output(). For example, I get:

bfr <- utils::capture.output({
  y <- lapply(1:3, FUN = print)
})
print(bfr)
## [1] "[1] 1" "[1] 2" "[1] 3"

正如预期的那样,但是当我尝试时:

as expected but when I try:

bfr <- utils::capture.output({
  y <- parallel::mclapply(1:3, FUN = print)
})
print(bfr)
## character(0)

没有捕获到输出.有趣的是,如果我在终端中在 Linux 上的 R 4.0.1 中不捕获输出而调用它,我会得到:

there's no output captured. Interestingly though, if I call it without capturing output in R 4.0.1 on Linux in the terminal, I get:

y <- parallel::mclapply(1:3, FUN = print)
[1] 1
[1] 3
[1] 2

很有趣吧?

使用本地 PSOCK 集群时可能会得到的另一个建议是在创建集群时设置参数 outfile = "".确实,当您在 Linux 上的终端中尝试此操作时,它看起来确实有效:

Another suggestion that you might get when using local PSOCK clusters, is to set argument outfile = "" when creating the cluster. Indeed, when you try this on Linux in the terminal, it certainly looks like it works:

cl <- parallel::makeCluster(2L, outfile = "")
## starting worker pid=25259 on localhost:11167 at 17:50:03.974
## starting worker pid=25258 on localhost:11167 at 17:50:03.974

y <- parallel::parLapply(cl, 1:3, fun = print)
## [1] 1
## [1] 2
## [1] 3

但这也带来了错误的希望.事实证明,您看到的输出仅仅是因为终端恰好显示了它.这可能会或可能不会在 RStudio 控制台中工作.您可能会在 Linux、macOS 和 MS Windows 上看到不同的行为.理解中最重要的部分是您的 R 会话根本看不到这个输出.如果我们尝试捕获它,我们会得到:

But also this gives false hopes. It turns out that the output you're seeing is only because the terminal happens to display it. This might or might not work in the RStudio Console. You might see different behavior on Linux, macOS, and MS Windows. The most important part of the understanding is that your R session does not see this output at all. If we try to capture it, we get:

bfr <- utils::capture.output({
  y <- parallel::parLapply(cl, 1:3, fun = print)
})
## [1] 1
## [1] 2
## [1] 3
print(bfr)
## character(0)

很有趣吧?但如果您了解并行"包的内部细节,实际上也就不足为奇了.

Interesting, eh? But actually not surprising if you understand the inner details on the 'parallel' package.

(免责声明:我是作者)我所知道的唯一可以正确中继标准输出的并行框架(例如 cat()print()、...) 和消息条件(例如 message())到主 R 会话是 未来 框架.您可以在其 'Text 中了解详细信息和消息输出'小插图,但这里有一个例子表明它有效:

(Disclaimer: I'm the author) The only parallel framework that I'm aware that properly relays standard output (e.g. cat(), print(), ...) and message conditions (e.g. message()) to the main R session is the future framework. You can read about the details in its 'Text and Message Output' vignette but here's an example showing that it works:

future::plan("multicore", workers = 2) ## forked processing

bfr <- utils::capture.output({
  y <- future.apply::future_lapply(1:3, FUN = print)
})
print(bfr)
[1] "[1] 1" "[1] 2" "[1] 3"

不管底层的并行化框架如何,它的工作原理都是一样的,例如与本地 PSOCK 工作人员:

It works the same regardless of underlying parallelization framework, e.g. with local PSOCK workers:

future::plan("multisession", workers = 2) ## PSOCK cluster

bfr <- utils::capture.output({
  y <- future.apply::future_lapply(1:3, FUN = print)
})
print(bfr)
[1] "[1] 1" "[1] 2" "[1] 3"

这在您运行 R 的所有操作系统和环境(包括 RStudio 控制台)上都是一样的.无论您使用哪种未来的 map-reduce 框架,它的行为也相同,例如(此处)future.applyfurrrforeachdoFuture.

This works the same on all operating systems and environments where you run R, including the RStudio Console. It also behaves the same regardless of which future map-reduce framework you use, e.g. (here) future.apply, furrr, and foreach with doFuture.

这篇关于为什么不在 RStudio 中打印并行作业?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆