如何在脚本工作(以srun开头)完全完成之前保留脚本? [英] How to hold up a script until a slurm job (start with srun) is completely finished?

查看:105
本文介绍了如何在脚本工作(以srun开头)完全完成之前保留脚本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用SLURM运行作业阵列,并使用以下作业阵列脚本(该作业以 sbatch job_array_script.sh [args]

I am running a job array with SLURM, with the following job array script (that I run with sbatch job_array_script.sh [args]:

#!/bin/bash

#SBATCH ... other options ...

#SBATCH --array=0-1000%200

srun ./job_slurm_script.py $1 $2 $3 $4

echo 'open' > status_file.txt

为了解释,我希望将 job_slurm_script.py 运行为一次阵列作业1000次,最多并行200个任务。当所有所有完成后,我想在 status_file.txt 。这是因为实际上我有超过10,000个作业,并且它在群集的MaxSubmissionLimit之上,因此我需要将其拆分为较小的块(以1000个元素的作业数组),然后一个接一个地运行它们(仅当

To explain, I want job_slurm_script.py to be run as an array job 1000 times with 200 tasks maximum in parallel. And when all of those are done, I want to write 'open' to status_file.txt. This is because in reality I have more than 10,000 jobs, and this is above my cluster's MaxSubmissionLimit, so I need to split it into smaller chunks (at 1000-element job arrays) and run them one after the other (only when the previous one is finished).

但是,要使其正常工作,仅在整个作业数组完成后,echo语句才能触发(外部其中,我有一个循环检查 status_file.txt ,以便查看作业是否完成,即内容是否为字符串 open。

However, for this to work, the echo statement can only trigger once the entire job array is finished (outside of this, I have a loop which checks status_file.txt so see if the job is finished, i.e when the contents are the string 'open').

到目前为止,我认为 srun 可以保留脚本,直到整个作业数组完成为止。但是,有时 srun 返回并且脚本在作业完成之前进入echo语句,因此所有后续作业都会从群集中弹起,因为它超出了提交限制。

Up to now I thought that srun holds the script up until the whole job array is finished. However, sometimes srun "returns" and the script goes to the echo statement before the jobs are finished, so all the subsequent jobs bounce off the cluster since it goes above the submission limit.

那么如何让运行保持运行直到整个作业阵列完成?

So how do I make srun "hold up" until the whole job array is finished?

推荐答案

您可以将标志-wait 添加到 sbatch

检查 sbatch 获取有关-等待的信息。

这篇关于如何在脚本工作(以srun开头)完全完成之前保留脚本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆