设置扭矩/摩押集群,以单个回路在每个节点上使用多个核心 [英] Setup torque/moab cluster to use multiple cores per node with a single loop

查看:170
本文介绍了设置扭矩/摩押集群,以单个回路在每个节点上使用多个核心的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是[我有一个内存受限的脚本,该脚本仅使用1个foreach循环,但我希望在node1上运行2个迭代,在node2上运行2个迭代.上面的链接问题允许您为外部环路的每个节点启动一个SOCK群集,然后为内部环路的MC群集启动一个SOCK群集,我认为没有利用每个节点上的多个内核. 我收到警告消息 Warning message: closing unused connection 3 (<-compute-1-30.local:11880)

I have a memory limited script that only uses 1 foreach loop but I'd like to get 2 iterations running on node1 and 2 iterations running on node2. The above linked question allows you to start a SOCK cluster to each node for the outer loop and then MC cluster for the inner loop and I think doesn't make use of the multiple cores on each node. I get the warning message Warning message: closing unused connection 3 (<-compute-1-30.local:11880)

如果我执行registerDoMC(2)如果我在registerDoSNOW(cl)之后执行此操作 谢谢.

if I do registerDoMC(2) if I do this after registerDoSNOW(cl) Thanks.

上一个问题的解决方案对于所提出的问题效果很好.请参阅下面的示例以了解所需内容.

The solution from the previous question works fine for the problem asked. see my example below for what I want.

以每个处理器2个节点和2个内核的方式开始交互式作业:

starting an interactive job with 2 nodes and 2cores per processor:

qsub -I -l nodes=2:ppn=2

启动R后:

library(doParallel)
f <- Sys.getenv('PBS_NODEFILE')
nodes <- unique(if (nzchar(f)) readLines(f) else 'localhost')
print(nodes)

这是我正在运行的两个节点:

here are the two nodes I"m running on:

[1] "compute-3-15" "compute-1-32"

在这两个节点上启动袜子集群:

start the sock cluster on these two nodes:

cl <- makePSOCKcluster(nodes, outfile='')

我不确定为什么他们俩似乎都在compute-3-15 ....?

i'm not sure why they both seem to be on compute-3-15 .... ?

starting worker pid=25473 on compute-3-15.local:11708 at 16:54:17.048
starting worker pid=14746 on compute-3-15.local:11708 at 16:54:17.523

但是注册两个节点并运行一个foreach循环:

but register the two nodes and run a single foreach loop:

registerDoParallel(cl)
r=foreach(i=seq(1,6),.combine='c') %dopar% { Sys.info()[['nodename']]}
print(r)

r的输出表明虽然同时使用了两个节点:

output of r indicates that both nodes were used though:

 [1] "compute-3-15.local" "compute-1-32.local" "compute-3-15.local"
 [4] "compute-1-32.local" "compute-3-15.local" "compute-3-15.local"

现在,我真正想要的是让foreach循环在4个内核上运行,每个节点上2个.

now, what I'd really like is for that foreach loop to run on 4 cores, 2 on each node.

library(doMC)
registerDoMC(4)
r=foreach(i=seq(1,6),.combine='c') %dopar% { Sys.info()[['nodename']]}
print(r)

输出表明仅使用了1个节点,但大概是该节点上的两个内核.

the output indicates that only 1 node was used, but presumably both cores on that one node.

[1] "compute-3-15.local" "compute-3-15.local" "compute-3-15.local"
[4] "compute-3-15.local" "compute-3-15.local" "compute-3-15.local"

如何获得单个foreach循环以在多个节点上使用多个内核?

How do I get a SINGLE foreach loop to use multiple cores on multiple nodes?

推荐答案

为了在foreach/doParallel中使用多个节点,请在调用makePSOCKcluster时指定一个主机名向量.如果要在这些主机上使用多个内核,只需简单地多次指定主机名,以使makePSOCKcluster将为每个主机启动多个工作线程.

In order to use multiple nodes with foreach/doParallel, you specify a vector of hostnames when calling makePSOCKcluster. If you want to use multiple cores on those hosts, you simply specify the hostnames multiple times so that makePSOCKcluster will start multiple workers per host.

由于使用的是Torque资源管理器,因此可以使用以下函数来生成节点列表,该列表可以限制在任何节点上启动的最大工作程序数量:

Since you're using the Torque resource manager, you could use the following function to generate the node list which can limit the maximum number of workers started on any of the nodes:

getnodelist <- function(maxpernode=100) {
  f <- Sys.getenv('PBS_NODEFILE')
  x <- if (nzchar(f)) readLines(f) else rep('localhost', 3)
  d <- as.data.frame(table(x), stringsAsFactors=FALSE)
  rep(d$x, pmin(d$Freq, maxpernode))
}

下面是一个示例,该示例使用此功能在由Torque分配的每个节点上运行不超过两个工作程序:

Here's an example that uses this function to run no more than two workers on each node that was allocated by Torque:

library(doParallel)
nodelist <- getnodelist(2)
print(nodelist)
cl <- makePSOCKcluster(nodelist, outfile='')
registerDoParallel(cl)
r <- foreach(i=seq_along(nodelist), .combine='c') %dopar% {
  Sys.info()[['nodename']]
}
cat('results:\n')
print(r)

请注意,您不能使用doMC后端在多个节点上执行任务,因为doMC使用mclapply函数,该函数只能在本地计算机上创建工作线程.要使用多个节点,您必须使用后端,例如doParallel,doSNOW或doMPI.

Note that you cannot use the doMC backend to execute tasks on multiple nodes, since doMC uses the mclapply function which can only create workers on the local machine. To use multiple nodes, you have to use a backend such as doParallel, doSNOW, or doMPI.

这篇关于设置扭矩/摩押集群,以单个回路在每个节点上使用多个核心的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆