如何获得aprun排名 [英] How to get rank in aprun
问题描述
我试图用aprun运行一个多节点的工作。然而,我无法弄清楚如何获得军衔(或任何用作每个作业的ID)在bash的环境。像这样简单的工作:
I am trying to run a multi-node jobs with aprun. However, I couldn't figure out how to get the rank (or whatever that serves as the ID of each job) in bash environment. Like this simple job:
aprun -n 8 -N 2 ./examplebashscript.sh
我怎样才能在每个级别产生的作业?
如果没有像一个等级或任何唯一的作业ID,这aprun线将只运行完全一样的节目16次,这是不可取的。
How can I get the rank in each spawned jobs? Without something like a rank or any unique job ID, this aprun line will only run the exact same program 16 times, which is undesirable.
我一直在阅读文档时,令人惊讶的我找不到任何东西,甚至解释了aprun提供的默认变量。
I've been reading on the documentation, surprisingly I couldn't find anything that even explains the default variables provided by aprun.
我已经与中的mpirun猛砸工作过,我知道怎么去使用C和Python程序的每个岗位的等级值,但不能。 aprun甚至更少的记录。
I've worked with mpirun before, which I know how to get the rank values of each jobs using C and Python programs, but not in Bash. aprun is even less documented.
推荐答案
尝试寻找环境变量的 ALPS_APP_PE 在你已经aprun-ED的bash脚本。
Try looking for environment variable ALPS_APP_PE in the bash script that you have aprun-ed.
这将是脚本的每个实例不同(创建的实例数量是由aprun命令-n选项中给出)。
It will be different for each instance of the script (number of instances created is given by the -n option in the aprun command).
如果脚本随后执行MPI程序的一个实例,该实例将不得不ALPS_APP_PE给MPI等级值。
If the script subsequently executes an instance of the MPI program, that instance will have MPI rank value given by ALPS_APP_PE.
需要注意的是,有些网站的Cray可能决定不公开此变量,或者使用不同的名称。很老的版本ALPS也并不支持它,但这些都是罕见的。
The caveat is that some Cray sites may decide not to expose this variable, or to use a different name. Very old ALPS versions also don't support it, but these are rare.
为例参见本CUG 2014年纸:
See this CUG 2014 paper for an example:
https://cug.org/proceedings/cug2014_proceedings/includes/files/ pap136.pdf
这篇关于如何获得aprun排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!