如何获取用于SLURM作业的脚本的原始位置? [英] How to get original location of script used for SLURM job?
问题描述
我正在使用脚本启动SLURM作业,并且脚本必须根据其位置来工作,该位置是通过 SCRIPT_LOCATION = $(realpath $ 0)
在脚本本身内部获取的.但是SLURM会将脚本复制到 slurmd
文件夹并从那里开始工作,这会进一步破坏操作.
I'm starting the SLURM job with script and script must work depending on it's location which is obtained inside of script itself with SCRIPT_LOCATION=$(realpath $0)
. But SLURM copies script to slurmd
folder and starts job from there and it screws up further actions.
是否有任何选择可以在移动/复制Slurm作业之前获取脚本的位置?
Are there any option to get location of script used for slurm job before it has been moved/copied?
脚本位于网络共享文件夹/storage/software_folder/software_name/scripts/this_script.sh
中,并且必须:
Script is located in network shared folder /storage/software_folder/software_name/scripts/this_script.sh
and it must to:
- 获取自己的位置
- 返回
软件名称
文件夹 - 将
software_name
文件夹复制到节点上的本地文件夹/node_folder
- 从复制的文件夹
/node_folder/software_name/scripts/launch.sh
中运行另一个脚本
- get it's own location
- return the
software_name
folder - copy the
software_name
folder to a local folder/node_folder
on node - run another script from copied folder
/node_folder/software_name/scripts/launch.sh
我的脚本是
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --partition=my_partition_name
# getting location of software_name
SHARED_PATH=$(dirname $(dirname $(realpath $0)))
# separating the software_name from path
SOFTWARE_NAME=$(basename $SHARED_PATH)
# target location to copy project
LOCAL_SOFTWARE_FOLDER='/node_folder'
# corrected path for target
LOCAL_PATH=$LOCAL_SOFTWARE_FOLDER/$SOFTWARE_NAME
# Copying software folder from network storage to local
cp -r $SHARED_PATH $LOCAL_SOFTWARE_FOLDER
# running the script
sh $LOCAL_PATH/scripts/launch.sh
当我通过 sh/storage/software/scripts/this_script.sh
在节点本身(不使用SLURM)上运行它时,它运行得很好.
It runs perfectly, when I run it on the node itself (without using SLURM) via: sh /storage/software/scripts/this_script.sh
.
如果使用SLURM作为 sbatch/storage/software/scripts/this_script.sh
分配给其中一个节点,但是:
In case of running it with SLURM as
sbatch /storage/software/scripts/this_script.sh
it is assigned to one of nodes, but:
- 在运行之前,它会被复制到
/var/spool/slurmd/job_number/slurm_script
中,并且自$(dirname $(dirname $(realpath $ 0)))以来,所有内容都被破坏了
code>返回
/var/spool/slurmd
- before run it is copied to
/var/spool/slurmd/job_number/slurm_script
and it screws everything up since$(dirname $(dirname $(realpath $0)))
returns/var/spool/slurmd
使用SLURM启动脚本时,是否可以在脚本中获取原始位置(/storage/software_folder/software_name/
)?
Is it possible to get original location (/storage/software_folder/software_name/
) inside of script when it is started with SLURM?
P.S.所有机器都在运行Fedora 30(x64)
P.S. All machines are running Fedora 30 (x64)
更新1
有人建议将其作为 sbatch -D/storage/software_folder/software_name ./scripts/this_script.sh
运行,并使用 SHARED_PATH ="$ {SLURM_SUBMIT_DIR}"
在脚本本身内部.但这会引发错误 sbatch:错误:无法打开文件./scripts/this_script.sh
.
There was a suggestion to run as sbatch -D /storage/software_folder/software_name ./scripts/this_script.sh
and use the SHARED_PATH="${SLURM_SUBMIT_DIR}"
inside of script itself.
But it raise the error sbatch: error: Unable to open file ./scripts/this_script.sh
.
此外,我尝试使用绝对路径: sbatch -D/storage/software_folder/software_name/storage/software_folder/software_name/scripts/this_script.sh
.它尝试运行,但是:
Also, I tried to use absolute paths:
sbatch -D /storage/software_folder/software_name /storage/software_folder/software_name/scripts/this_script.sh
. It tries to run, but:
- 在这种情况下,它使用指定的文件夹仅创建输出文件
- 软件仍然不希望运行
- 尝试在脚本打印内容中使用
echo"$ {SLURM_SUBMIT_DIR}"
而不是/storage/software_folder/software_name
还有其他建议吗?
更新2:还尝试在脚本内部使用 #SBATCH --chdir =/storage/software_folder/software_name
,但在这种情况下, echo"$ {SLURM_SUBMIT_DIR}"
返回/home/username_who_started_script
或/
(如果以root身份运行)
UPDATE 2:
Also tried to use #SBATCH --chdir=/storage/software_folder/software_name
inside of script, but in such case echo "${SLURM_SUBMIT_DIR}"
returns /home/username_who_started_script
or /
(if run as root)
更新3
使用 $ {SLURM_SUBMIT_DIR}
的方法仅在任务以以下方式运行时有效:
Approach with ${SLURM_SUBMIT_DIR}
worked only if task is ran as:
cd /storage/software_folder/software_name
sbatch ./scripts/this_script.sh
但这似乎不是一个适当的解决方案.还有其他方法吗?
But it doesn't seem to be a proper solution. Are there any other ways?
解决方案
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --partition=my_partition_name
# check if script is started via SLURM or bash
# if with SLURM: there variable '$SLURM_JOB_ID' will exist
# `if [ -n $SLURM_JOB_ID ]` checks if $SLURM_JOB_ID is not an empty string
if [ -n $SLURM_JOB_ID ]; then
# check the original location through scontrol and $SLURM_JOB_ID
SCRIPT_PATH=$(scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}')
else
# otherwise: started with bash. Get the real location.
SCRIPT_PATH=$(realpath $0)
fi
# getting location of software_name
SHARED_PATH=$(dirname $(dirname $(SCRIPT_PATH)))
# separating the software_name from path
SOFTWARE_NAME=$(basename $SHARED_PATH)
# target location to copy project
LOCAL_SOFTWARE_FOLDER='/node_folder'
# corrected path for target
LOCAL_PATH=$LOCAL_SOFTWARE_FOLDER/$SOFTWARE_NAME
# Copying software folder from network storage to local
cp -r $SHARED_PATH $LOCAL_SOFTWARE_FOLDER
# running the script
sh $LOCAL_PATH/scripts/launch.sh
推荐答案
您可以像这样从 scontrol
获取提交脚本的初始位置(即在提交时):
You can get the initial (i.e. at submit time) location of the submission script from scontrol
like this:
scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}'
因此,您可以用上述内容替换 realpath $ 0
部分.当然,这只会在Slurm分配中起作用.因此,如果您希望脚本在任何情况下都能工作,则将需要一些逻辑,例如:
So you can replace the realpath $0
part with the above. This will only work within a Slurm allocation of course. So if you want the script to work in any situation, you will need some logic like:
if [ -n $SLURM_JOB_ID ] ; then
THEPATH=$(scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}')
else
THEPATH=$(realpath $0)
fi
然后继续
SHARED_PATH=$(dirname $(dirname "${THEPATH}"))
这篇关于如何获取用于SLURM作业的脚本的原始位置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!