GNU parallel --jobs选项在群集上使用多个节点，每个节点具有多个cpus [英] GNU parallel --jobs option using multiple nodes on cluster with multiple cpus per node

查看：321 发布时间：2020/11/23 22:00:30 hpc gnu-parallel

本文介绍了GNU parallel --jobs选项在群集上使用多个节点，每个节点具有多个cpus的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用gnu并行在每个节点有2个CPU的高性能(HPC)计算群集上启动代码.该集群使用TORQUE便携式批处理系统(PBS).我的问题是澄清在这种情况下GNU parallel的--jobs选项如何工作.

I am using gnu parallel to launch code on a high performance (HPC) computing cluster that has 2 CPUs per node. The cluster uses TORQUE portable batch system (PBS). My question is to clarify how the --jobs option for GNU parallel works in this scenario.

当我运行不带--jobs选项的，调用GNU parallel的PBS脚本时，如下所示:

When I run a PBS script calling GNU parallel without the --jobs option, like this:

#PBS -lnodes=2:ppn=2
...
parallel --env $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
  matlab -nodiplay -r "\"cd $PBS_O_WORKDIR,primes1({})\"" ::: 10 20 30 40

看起来每个内核仅使用一个CPU，并且还提供以下错误流:

it looks like it only uses one CPU per core, and also provides the following error stream:

bash: parallel: command not found
parallel: Warning: Could not figure out number of cpus on galles087 (). Using 1.
bash: parallel: command not found
parallel: Warning: Could not figure out number of cpus on galles108 (). Using 1.

对于每个节点来说，这似乎是一个错误.我不了解第一部分(bash: parallel: command not found)，但是第二部分告诉我它正在使用一个节点.

This looks like one error for each node. I don't understand the first part (bash: parallel: command not found), but the second part tells me it's using one node.

当我在并行调用中添加选项-j2时，错误消失了，我认为每个节点使用两个CPU.我仍然是HPC的新手，因此，检查该问题的方法是从我的代码中输出日期时间戳(虚拟matlab代码需要10秒钟的时间才能完成).我的问题是:

When I add the option -j2 to the parallel call, the errors go away, and I think that it's using two CPUs per node. I am still a newbie to HPC, so my way of checking this is to output date-time stamps from my code (the dummy matlab code takes 10's of seconds to complete). My questions are:

我正确使用了--jobs选项吗?指定-j2是否正确，因为每个节点有2个CPU?还是我应该使用-jN，其中N是CPU的总数(节点数乘以每个节点的CPU数)?
看来，GNU并行尝试自行确定每个节点的CPU数量.有什么方法可以使它正常工作吗?
bash: parallel: command not found消息是否有意义?

Am I using the --jobs option correctly? Is it correct to specify -j2 because I have 2 CPUs per node? Or should I be using -jN where N is the total number of CPUs (number of nodes multiplied by number of CPUs per node)?
It appears that GNU parallel attempts to determine the number of CPUs per node on it's own. Is there a way that I can make this work properly?
Is there any meaning to the bash: parallel: command not found message?

GNU parallel --jobs选项在群集上使用多个节点，每个节点具有多个cpus [英] GNU parallel --jobs option using multiple nodes on cluster with multiple cpus per node

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

GNU parallel --jobs选项在群集上使用多个节点，每个节点具有多个cpus [英] GNU parallel --jobs option using multiple nodes on cluster with multiple cpus per node

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭