PBS / TORQUE:我如何在多个节点上提交并行作业? [英] PBS/TORQUE: how do I submit a parallel job on multiple nodes?

查看:1072
本文介绍了PBS / TORQUE:我如何在多个节点上提交并行作业?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,现在我正在使用 qsub 的集群提交作业,但它们似乎总是在单个节点上运行。我现在正在运行它们。

  #PBS -l walltime = 10 
#PBS -l nodes = 4:gpus = 2
#PBS -rn
#PBS -N测试

range_0_total = $(seq 0 $(expr $ total - 1))

for i in $ range_0_total
do
$ PATH_TO_JOB_EXEC / job_executable&
完成
等待

如果您能告诉我,我会非常感激我做错了什么,或者只是因为我的测试任务太小。

解决方案

使用您的方法,您需要让您的for循环遍历$ PBS_NODEFILE指向的文件中的所有条目,然后在您内部循环中,您将需要ssh $ i $ PATH_TO_JOB_EXEC / job_executable&。

另一种更简单的方法是替换for循环并等待:

  pbsdsh $ PATH_TO_JOB_EXEC / job_executable 

这将在分配给您的工作的每个核心上运行程序的副本。如果您需要修改此行为,请查看pbsdsh手册页中的可用选项。


So, right now I'm submitting jobs on a cluster with qsub, but they seem to always run on a single node. I currently run them by doing

#PBS -l walltime=10
#PBS -l nodes=4:gpus=2
#PBS -r n
#PBS -N test

range_0_total = $(seq 0 $(expr $total - 1)) 

for i in $range_0_total
do
    $PATH_TO_JOB_EXEC/job_executable &
done
wait

I would be incredibly grateful if you could tell me if I'm doing something wrong, or if it's just that my test tasks are too small.

解决方案

With your approach, you need to have your for loop go through all of the entries in the file pointed to by $PBS_NODEFILE and then inside of you loop you would need "ssh $i $PATH_TO_JOB_EXEC/job_executable &".

The other, easier way to do this would be to replace the for loop and wait with:

pbsdsh $PATH_TO_JOB_EXEC/job_executable

This would run a copy of your program on each core assigned to your job. If you need to modify this behavior take a look at the options available in the pbsdsh man page.

这篇关于PBS / TORQUE:我如何在多个节点上提交并行作业?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆