多个节点上的单个R脚本 [英] Single R script on multiple nodes

查看:105
本文介绍了多个节点上的单个R脚本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想利用多个节点的CPU内核来执行一个R脚本.每个节点包含16个核心,并通过Slurm工具分配给我.

I would like to utilize CPU cores from multiple nodes to execute a single R script. Each node contains 16 cores and are assigned to me via a Slurm tool.

到目前为止,我的代码如下:

So far my code looks like the following:

ncores <- 16

List_1 <- list(...)
List_2 <- list(...)

cl <- makeCluster(ncores)
registerDoParallel(cl)
getDoParWorkers()

foreach(L_1=List_1) %:% 
foreach(L_2=List_2) %dopar% {
...
}

stopCluster(cl)

我在UNIX shell中通过以下命令执行它:

I execute it via the following command in a UNIX shell:

mpirun -np 1 R --no-save < file_path_R_script.R > another_file_path.Rout

在单个节点上工作正常.但是,我还没有弄清楚一旦我可以访问第二个节点,将ncores增加到32是否足够. R是否会自动在另一个节点上包括其他16个内核?还是我必须使用另一个R软件包?

That works fine on a single node. However, I have not figured out whether it is sufficient to increase ncores to 32 once I have access to a second node. Does R include the additional 16 cores on the other node automatically? Or do I have to make use of another R package?

推荐答案

在不使用

Using mpirun to launch an R script does not make sense without using Rmpi.

看看您的代码,您可能想不想使用MPI.使用2x16内核的方法如下所示.

Looking at your code, you might do want you want to do without MPI. The recipe would be as follows to use 2x16 cores.

询问2个任务,每个任务16 cpus

Ask for 2 tasks and 16 cpus per task

#SBATCH --nodes 2
#SBATCH --ntasks 2
#SBATCH --cpus-per-task 16

使用Slurm的 srun 命令启动程序

Start your program with Slurm's srun command

srun R --no-save < file_path_R_script.R > another_file_path.Rout

srun命令将在两个不同的节点上启动R脚本的2个实例,并将环境变量SLURM_PROCID设置为在一个节点上为0,在另一个节点上为1

The srun command will start 2 instances of the R script on two distinct nodes and will setup an environment variable SLURM_PROCID to be 0 on one node and 1 on the other

使用Rscript中的SLURM_PROCID值在srun

Use the value of SLURM_PROCID in your Rscript to split the work among the two processes started by srun

ncores <- 16

taskID <- as.numeric(Sys.getenv('SLURM_PROCID'))

List_1 <- list(...)
List_2 <- list(...)

cl <- makeCluster(ncores)
registerDoParallel(cl)
getDoParWorkers()

Lits_1 <- split(List_1, 1:2)[[taskID+1]] # Split work based on value of SLURM_PROCID

foreach(L_1=List_1) %:% 
foreach(L_2=List_2) %dopar% {
...
}

stopCluster(cl)

您需要将结果保存在磁盘上,然后将部分结果合并为一个完整的结果.

You will need to save the result on disk and then merge the partial results into a single full result.

这篇关于多个节点上的单个R脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆