如何获取用于SLURM作业的脚本的原始位置? [英] How to get original location of script used for SLURM job?

查看:95
本文介绍了如何获取用于SLURM作业的脚本的原始位置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用脚本启动SLURM作业,并且脚本必须根据其位置来工作,该位置是通过 SCRIPT_LOCATION = $(realpath $ 0)在脚本本身内部获取的.但是SLURM会将脚本复制到 slurmd 文件夹并从那里开始工作,这会进一步破坏操作.

I'm starting the SLURM job with script and script must work depending on it's location which is obtained inside of script itself with SCRIPT_LOCATION=$(realpath $0). But SLURM copies script to slurmd folder and starts job from there and it screws up further actions.

是否有任何选择可以在移动/复制Slurm作业之前获取脚本的位置?

Are there any option to get location of script used for slurm job before it has been moved/copied?

脚本位于网络共享文件夹/storage/software_folder/software_name/scripts/this_script.sh 中,并且必须:

Script is located in network shared folder /storage/software_folder/software_name/scripts/this_script.sh and it must to:

  1. 获取自己的位置
  2. 返回软件名称文件夹
  3. software_name 文件夹复制到节点上的本地文件夹/node_folder
  4. 从复制的文件夹/node_folder/software_name/scripts/launch.sh
  5. 中运行另一个脚本
  1. get it's own location
  2. return the software_name folder
  3. copy the software_name folder to a local folder /node_folder on node
  4. run another script from copied folder /node_folder/software_name/scripts/launch.sh

我的脚本是

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --partition=my_partition_name

# getting location of software_name 
SHARED_PATH=$(dirname $(dirname $(realpath $0)))
# separating the software_name from path
SOFTWARE_NAME=$(basename $SHARED_PATH)
# target location to copy project
LOCAL_SOFTWARE_FOLDER='/node_folder'
# corrected path for target
LOCAL_PATH=$LOCAL_SOFTWARE_FOLDER/$SOFTWARE_NAME

# Copying software folder from network storage to local
cp -r $SHARED_PATH $LOCAL_SOFTWARE_FOLDER
# running the script
sh $LOCAL_PATH/scripts/launch.sh

当我通过 sh/storage/software/scripts/this_script.sh 在节点本身(不使用SLURM)上运行它时,它运行得很好.

It runs perfectly, when I run it on the node itself (without using SLURM) via: sh /storage/software/scripts/this_script.sh.

如果使用SLURM作为 sbatch/storage/software/scripts/this_script.sh 分配给其中一个节点,但是:

In case of running it with SLURM as sbatch /storage/software/scripts/this_script.sh it is assigned to one of nodes, but:

  • 在运行之前,它会被复制到/var/spool/slurmd/job_number/slurm_script 中,并且自 $(dirname $(dirname $(realpath $ 0)))以来,所有内容都被破坏了code>返回/var/spool/slurmd
  • before run it is copied to /var/spool/slurmd/job_number/slurm_script and it screws everything up since $(dirname $(dirname $(realpath $0))) returns /var/spool/slurmd

使用SLURM启动脚本时,是否可以在脚本中获取原始位置(/storage/software_folder/software_name/)?

Is it possible to get original location (/storage/software_folder/software_name/) inside of script when it is started with SLURM?

P.S.所有机器都在运行Fedora 30(x64)

P.S. All machines are running Fedora 30 (x64)

更新1

有人建议将其作为 sbatch -D/storage/software_folder/software_name ./scripts/this_script.sh 运行,并使用 SHARED_PATH ="$ {SLURM_SUBMIT_DIR}" 在脚本本身内部.但这会引发错误 sbatch:错误:无法打开文件./scripts/this_script.sh.

There was a suggestion to run as sbatch -D /storage/software_folder/software_name ./scripts/this_script.sh and use the SHARED_PATH="${SLURM_SUBMIT_DIR}" inside of script itself. But it raise the error sbatch: error: Unable to open file ./scripts/this_script.sh.

此外,我尝试使用绝对路径: sbatch -D/storage/software_folder/software_name/storage/software_folder/software_name/scripts/this_script.sh .它尝试运行,但是:

Also, I tried to use absolute paths: sbatch -D /storage/software_folder/software_name /storage/software_folder/software_name/scripts/this_script.sh. It tries to run, but:

  • 在这种情况下,它使用指定的文件夹仅创建输出文件
  • 软件仍然不希望运行
  • 尝试在脚本打印内容中使用 echo"$ {SLURM_SUBMIT_DIR}" 而不是/storage/software_folder/software_name

还有其他建议吗?

更新2:还尝试在脚本内部使用 #SBATCH --chdir =/storage/software_folder/software_name ,但在这种情况下, echo"$ {SLURM_SUBMIT_DIR}" 返回/home/username_who_started_script /(如果以root身份运行)

UPDATE 2: Also tried to use #SBATCH --chdir=/storage/software_folder/software_name inside of script, but in such case echo "${SLURM_SUBMIT_DIR}" returns /home/username_who_started_scriptor / (if run as root)

更新3

使用 $ {SLURM_SUBMIT_DIR} 的方法仅在任务以以下方式运行时有效:

Approach with ${SLURM_SUBMIT_DIR} worked only if task is ran as:

cd /storage/software_folder/software_name
sbatch ./scripts/this_script.sh

但这似乎不是一个适当的解决方案.还有其他方法吗?

But it doesn't seem to be a proper solution. Are there any other ways?

解决方案

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --partition=my_partition_name

# check if script is started via SLURM or bash
# if with SLURM: there variable '$SLURM_JOB_ID' will exist
# `if [ -n $SLURM_JOB_ID ]` checks if $SLURM_JOB_ID is not an empty string
if [ -n $SLURM_JOB_ID ];  then
    # check the original location through scontrol and $SLURM_JOB_ID
    SCRIPT_PATH=$(scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}')
else
    # otherwise: started with bash. Get the real location.
    SCRIPT_PATH=$(realpath $0)
fi

# getting location of software_name 
SHARED_PATH=$(dirname $(dirname $(SCRIPT_PATH)))
# separating the software_name from path
SOFTWARE_NAME=$(basename $SHARED_PATH)
# target location to copy project
LOCAL_SOFTWARE_FOLDER='/node_folder'
# corrected path for target
LOCAL_PATH=$LOCAL_SOFTWARE_FOLDER/$SOFTWARE_NAME

# Copying software folder from network storage to local
cp -r $SHARED_PATH $LOCAL_SOFTWARE_FOLDER
# running the script
sh $LOCAL_PATH/scripts/launch.sh

推荐答案

您可以像这样从 scontrol 获取提交脚本的初始位置(即在提交时):

You can get the initial (i.e. at submit time) location of the submission script from scontrol like this:

scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}'

因此,您可以用上述内容替换 realpath $ 0 部分.当然,这只会在Slurm分配中起作用.因此,如果您希望脚本在任何情况下都能工作,则将需要一些逻辑,例如:

So you can replace the realpath $0 part with the above. This will only work within a Slurm allocation of course. So if you want the script to work in any situation, you will need some logic like:

if [ -n $SLURM_JOB_ID ] ; then
THEPATH=$(scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}')
else
THEPATH=$(realpath $0)
fi

然后继续

SHARED_PATH=$(dirname $(dirname "${THEPATH}"))

这篇关于如何获取用于SLURM作业的脚本的原始位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆