如何以编程方式检测内核数并使用所有内核运行MPI程序 [英] How to programmatically detect the number of cores and run an MPI program using all cores

查看:92
本文介绍了如何以编程方式检测内核数并使用所有内核运行MPI程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不想使用mpiexec -n 4 ./a.out在我的核心i7处理器(具有4个核心)上运行程序.相反,我想运行./a.out,让它检测内核数并启动MPI以每个内核运行一个进程.

I do not want to use mpiexec -n 4 ./a.out to run my program on my core i7 processor (with 4 cores). Instead, I want to run ./a.out, have it detect the number of cores and fire up MPI to run a process per core.

这个问题问答 MPI处理器数量?导致我使用mpiexec

This SO question and answer MPI Number of processors? led me to use mpiexec.

我要避免使用mpiexec的原因是因为我的代码注定要成为我正在处理的较大项目中的库.较大的项目具有GUI,并且用户将开始进行长时间的计算,这些计算将调用我的库,而该库又将使用MPI. UI和计算代码之间的集成并非易事……因此,启动外部进程并通过套接字或其他某种方式进行通信不是一种选择.它必须是一个库调用.

The reason I want to avoid mpiexec is because my code is destined to be a library inside a larger project I'm working on. The larger project has a GUI and the user will be starting long computations that will call my library, which will in turn use MPI. The integration between the UI and the computation code is not trivial... so launching an external process and communicating via a socket or some other means is not an option. It must be a library call.

这可能吗?我该怎么办?

Is this possible? How do I do it?

推荐答案

通常,这是一件不平凡的事情.另外,几乎没有任何便携式解决方案不依赖于某些MPI实现细节.以下是与Open MPI以及可能与其他常规MPI实现(MPICH,Intel MPI等)一起使用的示例解决方案.它提供了另一个特殊的命令行参数,它涉及第二个可执行文件或原始可执行文件直接调用您的库的方法.就像这样.

This is quite a nontrivial thing to achieve in general. Also, there is hardly any portable solution that does not depend on some MPI implementation specifics. What follows is a sample solution that works with Open MPI and possibly with other general MPI implementations (MPICH, Intel MPI, etc.). It involves a second executable or a means for the original executable to directly call you library provided some special command-line argument. It goes like this.

假定原始可执行文件仅以./a.out身份启动.调用库函数时,它会调用MPI_Init(NULL, NULL),这会初始化MPI.由于该可执行文件不是通过mpiexec启动的,因此会退回到所谓的 singleton MPI初始化,即,它会创建一个由单个进程组成的MPI作业.要执行分布式计算,您必须启动更多的MPI流程,这通常会使情况变得复杂.

Assume the original executable was started simply as ./a.out. When your library function is called, it calls MPI_Init(NULL, NULL), which initialises MPI. Since the executable was not started via mpiexec, it falls back to the so-called singleton MPI initialisation, i.e. it creates an MPI job that consists of a single process. To perform distributed computations, you have to start more MPI processes and that's where things get complicated in the general case.

MPI支持动态过程管理,其中一个MPI作业可以启动第二个MPI作业,并使用 intercommunicators 与之通信.当第一个作业调用MPI_Comm_spawnMPI_Comm_spawn_multiple时,会发生这种情况.第一个用于启动对所有MPI等级使用相同可执行文件的简单MPI作业,而第二个可以启动混合不同可执行文件的作业.两者都需要有关在何处以及如何启动流程的信息.这来自所谓的 MPI Universe ,它不仅提供有关已启动进程的信息,还提供有关动态启动进程可用的 slots 的信息. Universe是通过mpiexec或其他某种启动器机制构建的,该机制采用例如具有节点列表和每个节点上的插槽数的主机文件.在没有此类信息的情况下,某些MPI实现(包括Open MPI)将仅在与原始文件相同的节点上启动可执行文件. MPI_Comm_spawn[_multiple]有一个MPI_Info自变量,可用于为键值巴黎提供具有特定于实现的信息的列表. Open MPI支持add-hostfile键,该键可用于指定在生成子作业时要使用的主机文件.这对于例如允许用户通过GUI指定要用于MPI计算的主机列表很有用.但是,让我们集中讨论没有提供此类信息并且Open MPI只是在同一主机上运行子作业的情况.

MPI supports dynamic process management, in which one MPI job can start a second one and communicate with it using intercommunicators. This happens when the first job calls MPI_Comm_spawn or MPI_Comm_spawn_multiple. The first one is used to start simple MPI jobs that use the same executable for all MPI ranks while the second one can start jobs that mix different executables. Both need information as to where and how to launch the processes. This comes from the so-called MPI universe, which provides information not only about the started processes, but also about the available slots for dynamically started ones. The universe is constructed by mpiexec or by some other launcher mechanism that takes, e.g., a host file with list of nodes and number of slots on each node. In the absence of such information, some MPI implementations (Open MPI included) will simply start the executables on the same node as the original file. MPI_Comm_spawn[_multiple] has an MPI_Info argument that can be used to supply a list of key-value paris with implementation-specific information. Open MPI supports the add-hostfile key that can be used to specify a hostfile to be used when spawning the child job. This is useful for, e.g., allowing the user to specify via the GUI a list of hosts to use for the MPI computation. But let's concentrate on the case where no such information is provided and Open MPI simply runs the child job on the same host.

假定工作程序可执行文件称为worker.或者,如果使用某些特殊的命令行选项(例如-worker)调用原始可执行文件,则该可执行文件可以用作工作程序.如果要总共使用N个进程执行计算,则需要启动N-1个worker.这很简单:

Assume the worker executable is called worker. Or that the original executable can serve as worker if called with some special command-line option, -worker for example. If you want to perform computation with N processes in total, you need to launch N-1 workers. This is simple:

(单独的可执行文件)

MPI_Comm child_comm;
MPI_Comm_spawn("./worker", MPI_ARGV_NULL, N-1, MPI_INFO_NULL, 0,
               MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);

(相同的可执行文件,带有一个选项)

(same executable, with an option)

MPI_Comm child_comm;
char *argv[] = { "-worker", NULL };
MPI_Comm_spawn("./a.out", argv, N-1, MPI_INFO_NULL, 0,
               MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);

如果一切顺利,将child_comm设置为 intercommunicator 的句柄,该句柄可用于与新作业进行通信.由于交互器使用起来有些棘手,并且父子工作划分需要复杂的程序逻辑,因此可以简单地将交互器的两侧合并为一个替换了MPI_COMM_WORLD的大世界"通信器.在父母一方:

If everything goes well, child_comm will be set to the handle of an intercommunicator that can be used to communicate with the new job. As intercommunicators are kind of tricky to use and the parent-child job division requires complex program logic, one could simply merge the two sides of the intercommunicator into a "big world" communicator that replaced MPI_COMM_WORLD. On the parent's side:

MPI_Comm bigworld;
MPI_Intercomm_merge(child_comm, 0, &bigworld);

在孩子的身边:

MPI_Comm parent_comm, bigworld;
MPI_Get_parent(&parent_comm);
MPI_Intercomm_merge(parent_comm, 1, &bigworld);

合并完成后,所有进程都可以使用bigworld而不是MPI_COMM_WORLD进行通信.请注意,子作业不会与父作业共享其MPI_COMM_WORLD.

After the merge is complete, all processes can communicate using bigworld instead of MPI_COMM_WORLD. Note that child jobs do not share their MPI_COMM_WORLD with the parent job.

综上所述,这是一个具有两个单独程序代码的完整功能示例.

To put it all together, here is a complete functioning example with two separate program codes.

main.c

#include <stdio.h>
#include <mpi.h>

int main (void)
{
   MPI_Init(NULL, NULL);

   printf("[main] Spawning workers...\n");

   MPI_Comm child_comm;
   MPI_Comm_spawn("./worker", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,
                  MPI_COMM_SELF, &child_comm, MPI_ERRCODES_IGNORE);

   MPI_Comm bigworld;
   MPI_Intercomm_merge(child_comm, 0, &bigworld);

   int size, rank;
   MPI_Comm_rank(bigworld, &rank);
   MPI_Comm_size(bigworld, &size);
   printf("[main] Big world created with %d ranks\n", size);

   // Perform some computation
   int data = 1, result;
   MPI_Bcast(&data, 1, MPI_INT, 0, bigworld);
   data *= (1 + rank);
   MPI_Reduce(&data, &result, 1, MPI_INT, MPI_SUM, 0, bigworld);
   printf("[main] Result = %d\n", result);

   MPI_Barrier(bigworld);

   MPI_Comm_free(&bigworld);
   MPI_Comm_free(&child_comm);

   MPI_Finalize();
   printf("[main] Shutting down\n");
   return 0;
}

worker.c

#include <stdio.h>
#include <mpi.h>

int main (void)
{
   MPI_Init(NULL, NULL);

   MPI_Comm parent_comm;
   MPI_Comm_get_parent(&parent_comm);

   int rank, size;
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &size);
   printf("[worker] %d of %d here\n", rank, size);

   MPI_Comm bigworld;
   MPI_Intercomm_merge(parent_comm, 1, &bigworld);

   MPI_Comm_rank(bigworld, &rank);
   MPI_Comm_size(bigworld, &size);
   printf("[worker] %d of %d in big world\n", rank, size);

   // Perform some computation
   int data;
   MPI_Bcast(&data, 1, MPI_INT, 0, bigworld);
   data *= (1 + rank);
   MPI_Reduce(&data, NULL, 1, MPI_INT, MPI_SUM, 0, bigworld);

   printf("[worker] Done\n");
   MPI_Barrier(bigworld);

   MPI_Comm_free(&bigworld);
   MPI_Comm_free(&parent_comm);

   MPI_Finalize();
   return 0;
}

这是它的工作方式:

$ mpicc -o main main.c
$ mpicc -o worker worker.c
$ ./main
[main] Spawning workers...
[worker] 0 of 2 here
[worker] 1 of 2 here
[worker] 1 of 3 in big world
[worker] 2 of 3 in big world
[main] Big world created with 3 ranks
[worker] Done
[worker] Done
[main] Result = 6
[main] Shutting down

子作业必须使用MPI_Comm_get_parent以获得与父作业的对讲机.当进程不属于此类子作业的一部分时,返回值将为MPI_COMM_NULL.这提供了一种在同一可执行文件中同时实现主程序和工作程序的简便方法.这是一个混合示例:

The child job has to use MPI_Comm_get_parent to obtain the intercommunicator to the parent job. When a process is not part of such a child job, the returned value will be MPI_COMM_NULL. This allows for an easy way to implement both the main program and the worker in the same executable. Here is a hybrid example:

#include <stdio.h>
#include <mpi.h>

MPI_Comm bigworld_comm = MPI_COMM_NULL;
MPI_Comm other_comm = MPI_COMM_NULL;

int parlib_init (const char *argv0, int n)
{
    MPI_Init(NULL, NULL);

    MPI_Comm_get_parent(&other_comm);
    if (other_comm == MPI_COMM_NULL)
    {
        printf("[main] Spawning workers...\n");
        MPI_Comm_spawn(argv0, MPI_ARGV_NULL, n-1, MPI_INFO_NULL, 0,
                       MPI_COMM_SELF, &other_comm, MPI_ERRCODES_IGNORE);
        MPI_Intercomm_merge(other_comm, 0, &bigworld_comm);
        return 0;
    }

    int rank, size;
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    printf("[worker] %d of %d here\n", rank, size);
    MPI_Intercomm_merge(other_comm, 1, &bigworld_comm);
    return 1;
}

int parlib_dowork (void)
{
    int data = 1, result = -1, size, rank;

    MPI_Comm_rank(bigworld_comm, &rank);
    MPI_Comm_size(bigworld_comm, &size);

    if (rank == 0)
    {
        printf("[main] Doing work with %d processes in total\n", size);
        data = 1;
    }

    MPI_Bcast(&data, 1, MPI_INT, 0, bigworld_comm);
    data *= (1 + rank);
    MPI_Reduce(&data, &result, 1, MPI_INT, MPI_SUM, 0, bigworld_comm);

    return result;
}

void parlib_finalize (void)
{
    MPI_Comm_free(&bigworld_comm);
    MPI_Comm_free(&other_comm);
    MPI_Finalize();
}

int main (int argc, char **argv)
{
    if (parlib_init(argv[0], 4))
    {
        // Worker process
        (void)parlib_dowork();
        printf("[worker] Done\n");
        parlib_finalize();
        return 0;
    }

    // Main process
    // Show GUI, save the world, etc.
    int result = parlib_dowork();
    printf("[main] Result = %d\n", result);
    parlib_finalize();

    printf("[main] Shutting down\n");
    return 0;
}

这是示例输出:

$ mpicc -o hybrid hybrid.c
$ ./hybrid
[main] Spawning workers...
[worker] 0 of 3 here
[worker] 2 of 3 here
[worker] 1 of 3 here
[main] Doing work with 4 processes in total
[worker] Done
[worker] Done
[main] Result = 10
[worker] Done
[main] Shutting down

设计此类并行库时要记住一些事情:

Some things to keep in mind when designing such parallel libraries:

  • MPI只能初始化一次.如有必要,请调用MPI_Initialized来检查库是否已初始化.
  • MPI只能完成一次.同样,MPI_Finalized是您的朋友.可以在atexit()处理程序之类的程序中使用它,以在程序退出时实现通用MPI终结.
  • 在线程上下文中使用时(通常在涉及GUI时),必须在支持线程的情况下初始化MPI.参见MPI_Init_thread.
  • MPI can only be initialised once. If necessary, call MPI_Initialized to check if the library has already been initialised.
  • MPI can only be finalized once. Again, MPI_Finalized is your friend. It can be used in something like an atexit() handler to implement a universal MPI finalisation on program exit.
  • When used in threaded contexts (usual when GUIs are involved), MPI must be initialised with support for threads. See MPI_Init_thread.

这篇关于如何以编程方式检测内核数并使用所有内核运行MPI程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆