如何在lsf中启动多线程的mpi进程? [英] How to launch multithreaded mpi processes in lsf?

查看:754
本文介绍了如何在lsf中启动多线程的mpi进程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用LSF提交一项工作:

I want to use LSF to submit a job which:

  • 在4个节点上并行运行
  • 每个节点只有一个mpi进程
  • 每个进程有12个线程

在没有LSF的情况下,我只需在4个节点上使用mpi启动,例如:

In the absence of LSF, I would simply launch with mpi on 4 nodes, like:

mpirun -hosts host1,host2,host3,host4 -np 4 ./myprocess --numthreads = 12

mpirun -hosts host1,host2,host3,host4 -np 4 ./myprocess --numthreads=12

但是,在存在LSF的情况下,我看不到该怎么做?我敢肯定,这可能是一种非常标准的方法,但是我对LSF还是很陌生.我四处搜寻,但答案对我来说并不立即明显.我发现 LSF中的混合MPI/OpenMP ,但这似乎并不完全同样,似乎一次只需要一个主机.

However, in the presence of LSF, I can't see how to do this? I'm sure there's probably a very standard way to do it, but I'm quite new to LSF. I googled around, but the answer wasn't immediately obvious to me. I found Hybrid MPI/OpenMP in LSF , but it doesn't seem to be quite the same, it seems to only need a single host at a time.

推荐答案

您链接到的另一个问题可以为您提供所需的确切信息,但是您必须对其稍加修改,因为它是为OpenMP应用程序编写的,其数量为线程由OMP_NUM_THREADS环境变量控制.

The other question that you have linked to gives you exactly what you need, but you have to adapt it slightly, as it is written for OpenMP applications whose number of threads is controlled by the OMP_NUM_THREADS environment variable.

这是作业脚本中最重要的部分:

Here are the most important parts of the job script:

  • #BSUB -n 4-请求4个插槽
  • #BSUB -R "span[ptile=1]"-请求在每个节点上分配一个插槽;此选项与前一个选项结合使用可将作业跨越4个不同的节点,并指示LSF在生成的主机文件中为每个主机放置一个插槽
  • #BSUB -x-请求对节点的独占访问
  • #BSUB -n 4 - request 4 slots
  • #BSUB -R "span[ptile=1]" - request that slots are distributed one per node; this option in combination with the previous one spans the job over 4 different nodes and instructs LSF to put one slot per host in the generated host file
  • #BSUB -x - request exclusive access to the nodes

以上三个选项将指示LSF分配4个节点,并且将在每个节点上保留一个插槽.由于还请求独占访问,因此没有其他作业将与该作业共享相同的节点,并且每个节点可以启动任意多个线程.然后,您需要调用Open MPI的mpiexec,如果在Open MPI设置中编译了LSF集成,它将自动从LSF提取主机列表并为每个节点启动一个进程.

The above three options would instruct LSF to allocate 4 nodes and it will reserve one slot on each node. Since also exclusive access is being requested, no other jobs will share the same nodes with the job and you can start as many threads as you like per node. Then all you need is to call Open MPI's mpiexec and if LSF integration was compiled in your Open MPI setup, it will automatically pick up the host list from LSF and start one process per node.

一个示例LSF作业文件如下所示:

A sample LSF job file would look like this:

#BSUB -n 4
#BSUB -R "span[ptile=1]"
#BSUB -x

mpiexec -np 4 ./myprocess --numthreads=12

确保使用-W选项还请求足够的运行时间,并使用-M选项请求足够的内存.在每个插槽中 请求LSF(以及大多数其他分布式资源管理器)中的内存,因此,您应该指定./myprocess的任何实例将消耗的最大内存量.

Make sure you also request enough run time with the -W option and a sufficient amount of memory with the -M option. Memory in LSF (as well as in most other distributed resource managers) is requested per slot, therefore you should specify the maximum amount of memory that any instance of ./myprocess would consume.

如果您的Open MPI发行版中未编译LSF集成,则该过程会更加复杂,因为您必须解析LSF主机文件并从前者创建一个Open MPI主机文件.

If LSF integration is not compiled in your Open MPI distribution, the process is somewhat more involved as you would have to parse the LSF hosts file and create an Open MPI hosts file from the former.

这篇关于如何在lsf中启动多线程的mpi进程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆