SLURM`srun` vs`sbatch`及其参数 [英] SLURM `srun` vs `sbatch` and their parameters
问题描述
我试图了解SLURM的 srun
和 sbatch
命令.我会对一般性的解释感到满意,而不是对以下问题的具体答案感到满意,但这里有一些特定的混淆点,可以作为起点并给出我所寻找的想法.
根据文档,srun
用于提交作业,而sbatch
是提交作业以供以后执行,但实际区别对我来说还不清楚,而且它们的行为似乎是相同的.例如,我有一个包含2个节点的群集,每个节点具有2个CPU.如果我连续执行5次srun testjob.sh &
,它将很好地排队第五个作业,直到CPU可用为止,就像执行sbatch testjob.sh
一样.
为使问题更具体,我认为一个不错的起点是:我可以对某项做些什么而对另一项做不到,为什么? >
两个命令的许多参数都相同.似乎最相关的是--ntasks
,--nodes
,--cpus-per-task
,--ntasks-per-node
. 它们之间如何相互关联?srun
与sbatch
有何不同?
一个特别的区别是,如果testjob.sh
没有可执行权限,即chmod +x testjob.sh
,而sbatch
会很乐意运行它,则srun
会导致错误. 幕后"发生了什么事情导致这种情况?
文档还提到srun
通常在sbatch
脚本内部使用.这就引出了一个问题:它们如何彼此交互,以及它们之间的规范"用例是什么?具体来说,我会单独使用srun
吗?
文档说
srun is used to submit a job for execution in real time
同时
sbatch is used to submit a job script for later execution.
它们实际上都接受相同的参数集.主要区别在于srun
是交互性和阻塞性的(您在终端中获得结果,并且在完成之前无法编写其他命令),而sbatch
是批处理和非阻塞性的(结果被写入文件中)您可以立即提交其他命令.
如果在背景中使用带有&
符号的srun
,则将删除srun
的阻止"功能,该功能将变为交互式但不阻止.但是,它仍然是交互式的,这意味着输出将使您的终端混乱,并且srun
进程链接到您的终端.如果断开连接,则将失去对它们的控制,否则它们可能会被杀死(取决于它们是否基本上使用stdout
).如果您连接到提交作业的计算机重新启动,它们将被杀死.
如果使用sbatch
,则提交作业,该作业由Slurm处理;您可以断开连接,杀死终端等,而不必担心.您的工作不再链接到正在运行的进程.
我可以用一种东西做一些我不能用另一种做的事,为什么?
sbatch
而不是srun
可用的功能是工作差错 .由于srun
可以在sbatch
脚本中使用,所以sbatch
没有什么可以做的.
它们之间如何相互关联,并且它们在srun和sbatch之间有何区别?
这两个命令中的所有参数--ntasks
,--nodes
,--cpus-per-task
,--ntasks-per-node
都具有相同的含义.几乎所有参数都是如此,--exclusive
除外.
在幕后"发生的事情导致这种情况发生了?
您通常使用 具体来说,我会单独使用srun吗? 除小型测试外,没有. I am trying to understand what the difference is between SLURM's According to the documentation, To make the question more concrete, I think a good place to start might be: What are some things that I can do with one that I cannot do with the other, and why? Many of the arguments to both commands are the same. The ones that seem the most relevant are One particular difference is that The documentation also mentions that The documentation says while They both accept practically the same set of parameters. The main difference is that If you use If you use What are some things that I can do with one that I cannot do with the other, and why? A feature that is available to How are these related to each other, and how do they differ for srun vs sbatch? All the parameters What is happening "under the hood" that causes this to be the case? How do they interact with each other, and what is the "canonical" usecase for each them? You typically use Specifically, would I ever use srun by itself? Other than for small tests, no. A common use is 这篇关于SLURM`srun` vs`sbatch`及其参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!srun
立即在远程主机上执行脚本,而sbatch
将脚本复制到内部存储中,然后在作业开始时将其上载到计算节点上.您可以通过在提交脚本后修改提交脚本来进行检查;更改将不被考虑(请参阅
sbatch
提交作业,并使用提交脚本中的srun
创建Slurm调用的作业步骤. srun
用于启动进程.如果您的程序是并行MPI程序,则srun
将负责创建所有MPI进程.如果没有,srun
将按--ntasks
选项指定的次数运行您的程序.有很多用例,具体取决于您的程序是否并行,运行时间是否长,是否由单个可执行文件组成,等等.除非另有说明,否则srun
默认继承sbatch
或salloc
在其下运行(来自此处 ).
srun --pty bash
通常用于在计算作业上获取外壳.srun
and sbatch
commands. I will be happy with a general explanation, rather than specific answers to the following questions, but here are some specific points of confusion that can be a starting point and give an idea of what I'm looking for. srun
is for submitting jobs, and sbatch
is for submitting jobs for later execution, but the practical difference is unclear to me, and their behavior seems to be the same. For example, I have a cluster with 2 nodes, each with 2 CPUs. If I execute srun testjob.sh &
5x in a row, it will nicely queue up the fifth job until a CPU becomes available, as will executing sbatch testjob.sh
.--ntasks
, --nodes
, --cpus-per-task
, --ntasks-per-node
. How are these related to each other, and how do they differ for srun
vs sbatch
?srun
will cause an error if testjob.sh
does not have executable permission i.e. chmod +x testjob.sh
whereas sbatch
will happily run it. What is happening "under the hood" that causes this to be the case?srun
is commonly used inside of sbatch
scripts. This leads to the question: How do they interact with each other, and what is the "canonical" usecase for each them? Specifically, would I ever use srun
by itself?srun is used to submit a job for execution in real time
sbatch is used to submit a job script for later execution.
srun
is interactive and blocking (you get the result in your terminal and you cannot write other commands until it is finished), while sbatch
is batch processing and non-blocking (results are written to a file and you can submit other commands right away).srun
in the background with the &
sign, then you remove the 'blocking' feature of srun
, which becomes interactive but non-blocking. It is still interactive though, meaning that the output will clutter your terminal, and the srun
processes are linked to your terminal. If you disconnect, you will loose control over them, or they might be killed (depending on whether they use stdout
or not basically). And they will be killed if the machine to which you connect to submit jobs is rebooted.sbatch
, you submit your job and it is handled by Slurm ; you can disconnect, kill your terminal, etc. with no consequence. Your job is no longer linked to a running process.
sbatch
and not to srun
is job arrrays. As srun
can be used within an sbatch
script, there is nothing that you cannot do with sbatch
.
--ntasks
, --nodes
, --cpus-per-task
, --ntasks-per-node
have the same meaning in both commands. That is true for nearly all parameters, with the notable exception of --exclusive
.
srun
immediately executes the script on the remote host, while sbatch
copies the script in an internal storage and then uploads it on the compute node when the job starts. You can check this by modifying your submission script after it has been submitted; changes will not be taken into account (see this).
sbatch
to submit a job and srun
in the submission script to create job steps as Slurm calls them. srun
is used to launch the processes. If your program is a parallel MPI program, srun
takes care of creating all the MPI processes. If not, srun
will run your program as many times as specified by the --ntasks
option. There are many use cases depending on whether your program is paralleled or not, has a long running time or not, is composed of a single executable or not, etc. Unless otherwise specified, srun
inherits by default the pertinent options of the sbatch
or salloc
which it runs under (from here).
srun --pty bash
to get a shell on a compute job.