MPI& pthreads:具有不同内核数的节点 [英] MPI & pthreads: nodes with different numbers of cores

查看:184
本文介绍了MPI& pthreads:具有不同内核数的节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简介

我想编写一个混合MPI/pthreads代码.我的目标是在每个节点上启动一个MPI进程,并将每个进程拆分为多个线程,这些线程实际上可以完成工作,但是通信仅在单独的MPI进程之间进行.

I want to write a hybrid MPI/pthreads code. My goal is to have one MPI process started on each node and have each of those processes split into multiple threads that will actually do the job, but with communication only happening between the separate MPI processes.

有很多描述这种情况的教程,称为混合编程",但它们通常假设同构簇.但是,我使用的是异构节点:它们具有不同的处理器和不同数量的内核,即,这些节点是4/8/12/16内核计算机的组合.

There are quite a few tutorials describing this situation, called hybrid programming, but they typically assume a homogeneous cluster. However, the one I am using has heterogeneous nodes: they have different processors and different numbers of cores, i.e. the nodes are a combination of 4/8/12/16 core machines.

我知道在此群集上运行MPI进程会使我的代码速度减慢到所使用的最慢CPU的速度;我接受这个事实.因此,我想提出我的问题.

I am aware that running an MPI process across this cluster will make my code slow down to the speed of the slowest CPU used; I accept that fact. With that I would like to get to my question.

是否有一种方法可以启动N个MPI进程(每个节点一个MPI进程),并让每个节点知道在该节点上有多少个物理核心可用?

我可以访问的MPI实现是OpenMPI.这些节点是Intel和AMD CPU的混合体.我想到了使用一个机器文件,并在每个节点上指定一个插槽,然后在本地计算出内核数量.但是,这样做似乎存在问题.我当然不是第一个遇到此问题的人,但是以某种方式搜索网络并没有为我指明正确的方向.除了找到自己的同质集群之外,是否有解决该问题的标准方法?

The MPI implementation I have access to is OpenMPI. The nodes are a mix of Intel and AMD CPUs. I thought of using a machinefile with each node specified as having one slot, then figuring out the number of cores locally. However, there seem to be problems with doing that. I am surely not the first person with this problem, but somehow searching the web didn't point me in the right direction yet. Is there a standard way of solving this problem other than finding oneself a homogeneous cluster?

推荐答案

使用Open MPI仅在每个节点上启动一个进程非常简单:

Launching one process only per node is very simple with Open MPI:

mpiexec -pernode ./mympiprogram

-pernode参数等效于-npernode 1,它指示ORTE启动器在主机列表中的每个节点上启动一个进程.此方法的优点是,无论如何提供实际的主机列表,该方法都可以工作,即,当它与某些资源管理器(例如Torque/PBS,SGE,LSF,SLURM等)紧密耦合时以及手动提供时都可以工作主机.即使主机列表包含具有多个插槽的节点,它也可以工作.

The -pernode argument is equivalent to -npernode 1 and it instructs the ORTE launcher to start one process per node present in the host list. This method has the advantage that it works regardless of how the actual host list is provided, i.e. works both when it comes from tight coupling with some resource manager (e.g. Torque/PBS, SGE, LSF, SLURM, etc.) and with manually provided hosts. It also works even if the host list contains nodes with multiple slots.

了解内核数量有些棘手,并且非常依赖于操作系统.但是Open MPI附带了 hwloc库,该库提供了用于查询系统组件的抽象API. ,包括内核数:

Knowing the number of cores is a bit tricky and very OS-specific. But Open MPI ships with the hwloc library which provides an abstract API to query the system components, including the number of cores:

hwloc_topology_t topology;

/* Allocate and initialize topology object. */
hwloc_topology_init(&topology);

/* Perform the topology detection. */
hwloc_topology_load(topology);

/* Get the number of cores */
unsigned nbcores = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_CORE);

/* Destroy topology object. */
hwloc_topology_destroy(topology);

如果您想使工作中的每个MPI进程都可以使用整个群集中的核心数,则需要一个简单的MPI_Allgather:

If you want to make the number of cores across the cluster available to each MPI process in your job, a simple MPI_Allgather is what you need:

/* Obtain the number or MPI processes in the job */
int nranks;
MPI_Comm_size(MPI_COMM_WORLD, &nranks);

unsigned cores[nranks];
MPI_Allgather(&nbcores, 1, MPI_UNSIGNED,
              cores, 1, MPI_UNSIGNED, MPI_COMM_WORLD);

这篇关于MPI& pthreads:具有不同内核数的节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆