了解 R 中 mclapply 和 parLapply 之间的区别 [英] Understanding the differences between mclapply and parLapply in R

查看:49
本文介绍了了解 R 中 mclapply 和 parLapply 之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近开始在一个项目中使用 R 中的并行技术,并使用 mclapply 来自 并行 包.但是,我对 Windows 的 parLapply 的理解遇到了障碍.

I've recently started using parallel techniques in R for a project and have my program working on Linux systems using mclapply from the parallel package. However, I've hit a road block with my understanding of parLapply for Windows.

使用 mclapply 我可以设置内核数、迭代数,并将其传递给我工作区中的现有函数.

Using mclapply I can set the number of cores, iterations, and pass that to an existing function in my workspace.

mclapply(1:8, function(z) adder(z, 100), mc.cores=4)

我似乎无法在 Windows 中使用 parLapply 实现相同的效果.据我了解,我需要使用 clusterExport() 传递所有变量,并将我想要应用到参数中的实际函数传递给它.

I don't seem to be able to achieve the same in windows using parLapply. As I understand it, I need to pass all the variables through using clusterExport() and pass the actual function I want to apply into the argument.

这是正确的还是有类似于适用于 Windows 的 mclapply 函数的东西?

Is this correct or is there something similar to the mclapply function that's applicable to Windows?

推荐答案

mclapply 的美妙之处在于,在 mclapply 的那一刻,所有工作进程都是作为 master 的克隆而创建的 被调用,因此您不必担心在每个集群工作器上重现您的环境.不幸的是,这在 Windows 上是不可能的.

The beauty of mclapply is that the worker processes are all created as clones of the master right at the point that mclapply is called, so you don't have to worry about reproducing your environment on each of the cluster workers. Unfortunately, that isn't possible on Windows.

在使用 parLapply 时,您通常需要执行以下附加步骤:

When using parLapply, you generally have to perform the following additional steps:

  • 创建 PSOCK 集群
  • 根据需要注册集群
  • 在集群工作器上加载必要的包
  • 将必要的数据和函数导出到集群工作者的全局环境

此外,完成后,最好使用 stopCluster 关闭 PSOCK 集群.

Also, when you're done, it's good practice to shutdown the PSOCK cluster using stopCluster.

这是您的示例到 parLapply 的翻译:

Here's a translation of your example to parLapply:

library(parallel)
cl <- makePSOCKcluster(4)
setDefaultCluster(cl)
adder <- function(a, b) a + b
clusterExport(NULL, c('adder'))
parLapply(NULL, 1:8, function(z) adder(z, 100))

如果您的 adder 函数需要一个包,您必须在使用 parLapply 调用它之前在每个工作线程上加载该包.您可以使用 clusterEvalQ 轻松做到这一点:

If your adder function requires a package, you'll have to load that package on each of the workers before calling it with parLapply. You can do that quite easily with clusterEvalQ:

clusterEvalQ(NULL, library(MASS))

注意 clusterExportclusterEvalparLapply 的第一个参数 NULL 表示它们应该使用集群通过 setDefaultCluster 注册的对象.如果您的程序在许多不同的函数中使用 mclapply,这将非常有用,这样您就不必在将程序转换为使用 时将集群对象传递给每个需要它的函数parLapply.

Note that the NULL first argument to clusterExport, clusterEval and parLapply indicates that they should use the cluster object registered via setDefaultCluster. That can be very useful if your program is using mclapply in many different functions, so that you don't have to pass the cluster object to every function that needs it when converting your program to use parLapply.

当然,adder 可能会调用全局环境中的其他函数,这些函数会调用其他函数等.在这种情况下,您还必须导出它们并加载它们需要的任何包.另请注意,如果您在程序运行期间导出的任何变量发生更改,您将必须再次导出它们以便在集群工作器上更新它们.同样,这对于 mclapply 来说不是必需的,因为每当它被调用时它总是创建/克隆/分叉工作人员,因此没有必要.

Of course, adder may call other functions in your global environment which call other functions, etc. In that case, you'll have to export them as well and load any packages that they need. Also note that if any variables that you've exported change during the course of your program, you will have to export them again in order to update them on the cluster workers. Again, that isn't necessary with mclapply because it always creates/clones/forks the workers whenever it is called, making that unnecessary.

这篇关于了解 R 中 mclapply 和 parLapply 之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆