如何将Slurm-jobID作为输入参数传递给python? [英] How to pass the SLURM-jobID as an input argument to python?

查看:31
本文介绍了如何将Slurm-jobID作为输入参数传递给python?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用Slurm训练一批卷积神经网络方面还是个新手。为了轻松跟踪所有训练有素的CNN,我想将Slurm的jobID作为输入参数传递给python。将其他变量作为参数传递可以很好地工作。但是,我无法访问Slurm作业ID以通过。

我已经尝试使用${SLURM_JOBID}${SLURM_JOB_ID}%j%J。我还尝试将这些Slurm环境变量写入到一个变量中,然后再将其传递到Python中。

以下是我的最新代码:

#!/bin/bash

# --- info to user
echo "script started ... "

# --- setup environment
module purge            # clean up
module load python/3.6
module load nvidia/10.0
module load cudnn/10.0-v7 

# --- display information
HOST=`hostname`
echo "This script runs the CNN. Slurm scheduled it on node $HOST"
echo "I am interested of all environment variables Slurm adds:"
env | grep -i slurm

# --- start running ... 
echo " --- run --- "

# --- define some varibles
dc="dice"
sm="softmax"

# --- run a job using a slurm batch script
for layer in {3..15..2}
  do
    sbatch -N 1 -n 1 --mem=20G --mail-type=END --gres=gpu:V100:3 --wrap="singularity --noslurm tensorflow_19.03-py3.simg python run_CNN_dynlayer.py ${SLURM_JOBID} ${layer} ${dc}"
    sleep 1 # pause 1s to be kind to the scheduler...
    echo "jobid: "+${SLURM_JOBID}
    echo " --- next --- "
  done    

cmd如下所示:

femonk@rarp1 [CNN] ./run_CNN_test.slurm
script started ... 
This script runs the CNN. Slurm scheduled it on node rarp1
I am interested of all environment variables Slurm adds:
SLURM_ACCOUNT=AI
PYTHONPATH=/cluster/slurm/lib64/python3.6/site-packages:/cluster/slurm/lib64/python3.6/site-packages:/cluster/slurm/lib64/python3.6/site-packages:
 --- run --- 
Submitted batch job 3182711
jobid: 
 --- next --- 
femonk@rarp1 [CNN] 

有人知道我的代码出了什么问题吗? 事先非常感谢。

推荐答案

SLURM_JOBID环境变量仅可用于作业进程,而不能用于提交作业的进程。作业ID是从sbatch命令返回的,因此如果您希望将其放在变量中,则需要将其赋值。

  do
    SLURM_JOBID=$(sbatch --parsable -N 1 -n 1 --mem=20G --mail-type=END --gres=gpu:V100:3 --wrap="singularity --noslurm tensorflow_19.03-py3.simg python run_CNN_dynlayer.py ${SLURM_JOBID} ${layer} ${dc}")
    sleep 1 # pause 1s to be kind to the scheduler...
    echo "jobid: "+${SLURM_JOBID}
    echo " --- next --- "
  done   

请注意将命令替换$()sbatch--parsable参数一起使用。

另请注意,当前输出的Submitted batch job 3182711行将消失,因为它用于填充SLURM_JOBID变量。

这篇关于如何将Slurm-jobID作为输入参数传递给python?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆