充分利用SLURM上的所有CPU [英] Make use of all CPUs on SLURM
问题描述
我想在集群上运行作业.在不同的节点上有不同数量的CPU,我不知道哪个节点将分配给我.有什么适当的选项可以使作业在所有节点上创建与CPU一样多的任务?
I would like to run a job on the cluster. There are a different number of CPUs on different nodes and I have no idea which nodes will be assigned to me. What are the proper options so that the job can create as many tasks as CPUs on all nodes?
#!/bin/bash -l
#SBATCH -p normal
#SBATCH -N 4
#SBATCH -t 96:00:00
srun -n 128 ./run
推荐答案
实现目标的一个肮脏技巧是使用SLURM提供的环境变量.对于样本文件:
One dirty hack to achieve the objective is using the environment variables provided by the SLURM. For a sample sbatch file:
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --output=res.txt
#SBATCH --time=10:00
#SBATCH --nodes=2
echo $SLURM_CPUS_ON_NODE
echo $SLURM_JOB_NUM_NODES
num_core=$SLURM_CPUS_ON_NODE
num_node=$SLURM_JOB_NUM_NODES
let proc_num=$num_core*$num_node
echo $proc_num
srun -n $proc_num ./run
作业脚本中仅请求节点数. $ SLURM_CPUS_ON_NODE
将提供每个节点的cpus数.您可以将其与其他环境变量(例如: $ SLURM_JOB_NUM_NODES
)一起使用,以了解可能的任务数量.在上面的脚本中,动态任务计算是在节点是同质的前提下完成的(即 $ SLURM_CPUS_ON_NODE
仅给出一个数字).
Only the number of nodes are requested in the job script. $SLURM_CPUS_ON_NODE
will provide the number of cpus per node. You can use it along with other environment variables (eg: $SLURM_JOB_NUM_NODES
) to know the number of tasks possible. In the above script dynamic task calculation is done with the assumption that the nodes are homogenous (i.e $SLURM_CPUS_ON_NODE
will give only single number ).
对于异构节点, $ SLURM_CPUS_ON_NODE
将给出多个值(例如:如果分配的节点具有2和3 cpus,则为2,3).在这种情况下,可以使用 $ SLURM_JOB_NODELIST
来查找与分配的节点相对应的cpus数量,并以此计算所需的任务.
For heterogeneous nodes, $SLURM_CPUS_ON_NODE
will give multiple values (eg: 2,3 if the nodes allocated has 2 and 3 cpus). In such scenario, $SLURM_JOB_NODELIST
can be used to find out the number of cpus corresponding to the allocated nodes and with that you can calculate the required tasks.
这篇关于充分利用SLURM上的所有CPU的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!