MPI_Comm_spawn和MPI_Reduce [英] MPI_Comm_spawn and MPI_Reduce

查看:464
本文介绍了MPI_Comm_spawn和MPI_Reduce的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个程序。产生工人的主人,其执行一些计算,并且我希望主人从工人获得结果并存储总和。我试图使用MPI_Reduce从工人收集结果,工人使用MPI_Reduce发送到主MPI_Comm。我不知道这是否正确。这是我的程序:

I have two programs. The "master" which spawns "workers" which perform some calculations and I want the master to get the results from the workers and store the sum. I am trying to use MPI_Reduce to collect the results from the workers, and the workers use MPI_Reduce to send to the masters MPI_Comm. I am not sure if that is correct. Here are my programs:

主:

#include <mpi.h>
#include <iostream>
using namespace std;

int main(int argc, char *argv[]) { 
    int world_size, universe_size, *universe_sizep, flag; 

    int rc, send, recv;

    // intercommunicator
    MPI_Comm everyone;

    MPI_Init(&argc, &argv); 
    MPI_Comm_size(MPI_COMM_WORLD, &world_size); 

    if (world_size != 1) {
        cout << "Top heavy with management" << endl;
    } 

    MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &universe_sizep, &flag);  
    if (!flag) { 
        cout << "This MPI does not support UNIVERSE_SIZE. How many processes total?";
        cout << "Enter the universe size: ";
        cin >> universe_size; 
    } else {
        universe_size = *universe_sizep;
    }
    if (universe_size == 1) {
        cout << "No room to start workers" << endl;
    }

    MPI_Comm_spawn("so_worker", MPI_ARGV_NULL, universe_size-1,  
             MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone,  
             MPI_ERRCODES_IGNORE);

    send = 0;

    rc = MPI_Reduce(&send, &recv, 1, MPI_INT, MPI_SUM, 0, everyone);

    // store result of recv ...
    // other calculations here
    cout << "From spawned workers recv: " << recv << endl;

    MPI_Finalize(); 
    return 0; 
}

工人:

#include <mpi.h>
#include <iostream>
using namespace std;

int main(int argc, char *argv[]) { 

    int rc, send,recv;


    int parent_size, parent_id, my_id, numprocs; 
    // parent intercomm
    MPI_Comm parent; 
    MPI_Init(&argc, &argv); 

    MPI_Comm_get_parent(&parent); 
    if (parent == MPI_COMM_NULL) {
        cout << "No parent!" << endl;
    }
    MPI_Comm_remote_size(parent, &parent_size); 
    MPI_Comm_rank(parent, &parent_id) ; 
    //cout << "Parent is of size: " << size << endl;
    if (parent_size != 1) {
        cout << "Something's wrong with the parent" << endl;
    }

    MPI_Comm_rank(MPI_COMM_WORLD, &my_id) ;     
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs) ;  

    cout << "I'm child process rank "<< my_id << " and we are " << numprocs << endl;
    cout << "The parent process rank "<< parent_id << " and we are " << parent_size << endl;

    // get value of send
    send = 7; // just an example
    recv = 0;

    rc = MPI_Reduce(&send, &recv, 1, MPI_INT, MPI_SUM, parent_id, parent);
    if (rc != MPI_SUCCESS)
        cout << my_id << " failure on mpi_reduce in WORKER" << endl;

    MPI_Finalize(); 
    return 0; 
} 

我编译这两个文件,并执行如下(mpx ++ for osx):

I compiled both and execute like this (mpic++ for osx):

mpic++ so_worker.cpp -o so_worker
mpic++ so_master.cpp -o so_master
mpirun -n 1 so_master

这是运行生成worker的master的正确方法吗?

Is this the correct way to run a master that spawns the workers?

在master中,我总是从MPI_Reduce返回0。我可以从intercommunicators使用MPI_reduce,还是应该使用MPI_Send从工人和MPI_Recv从主?我真的不知道为什么它不工作。

In the master I always get 0 back from the MPI_Reduce. Can I use MPI_reduce from intercommunicators or should I use MPI_Send from workers and MPI_Recv from master? I'm really not sure why it's not working.

任何帮助将不胜感激。感谢!

Any help would be appreciated. Thanks!

推荐答案

MPI_Comm_get_parent 返回包含原始进程的父级互通器和所有产生的。在这种情况下,调用 MPI_Comm_rank(parent,& parent_id)不返回父级的rank,而是返回intercommunicator的本地组中的当前进程的rank: / p>

MPI_Comm_get_parent returns the parent intercommunicator that encompasses the original process and all the spawned ones. In this case calling MPI_Comm_rank(parent, &parent_id) does not return the rank of the parent but rather the rank of the current process in the local group of the intercommunicator:

I'm child process rank 0 and we are 3
The parent process **rank 0** and we are 1
I'm child process rank 1 and we are 3
The parent process **rank 1** and we are 1
I'm child process rank 2 and we are 3
The parent process **rank 2** and we are 1

这是为什么<$> c $ c> MPI_Reduce()调用不会成功,因为所有工作进程为根级别指定不同的值。由于最初有一个主进程,它在 parent 的远程组中的排名将是 0 ,因此所有工作者应该指定 0 作为 MPI_Reduce 的根目录:

That's why the MPI_Reduce() call would not succeed as all worker processes specify different values for the root rank. Since originally there was one master process, its rank in remote group of parent would be 0 and hence all workers should specify 0 as the root to MPI_Reduce:

//
// Worker code
//
rc = MPI_Reduce(&send, &recv, 1, MPI_INT, MPI_SUM, 0, parent);

这只是问题的一半。另一半是根植的集体操作(例如 MPI_REDUCE )与intercommunicators操作有点不同。首先必须决定两个组中的哪个组将托管根。一旦根组被识别,根进程必须通过 MPI_ROOT 作为 root 的值 MPI_REDUCE ,并且根组中的所有其他进程必须通过 MPI_PROC_NULL 。也就是说,接收组中的进程根本不参与根集合操作。因为主代码被写成使得在主组中只能有一个进程,所以将主代码中的 MPI_Reduce 的调用更改为: p>

This is only half of the problem. The other half is that rooted collective operations (e.g. MPI_REDUCE) operate a bit different with intercommunicators. One first has to decide which of the two groups would host the root. Once the root group is identified, the root process has to pass MPI_ROOT as the value of root in MPI_REDUCE and all other processes in the root group must pass MPI_PROC_NULL. That is the processes in the receiving group do not take part in the rooted collective operation at all. Since the master code is written so that there could be only one process in the master's group, then it would suffice to change the call to MPI_Reduce in the master code to:

//
// Master code
//
rc = MPI_Reduce(&send, &recv, 1, MPI_INT, MPI_SUM, MPI_ROOT, everyone);

请注意,master也不参与缩减操作。在这种情况下, sendbuf & send )的值是不相关的,因为根不会发送数据减少 - 它只是收集对远程组中的进程的值执行的减少的结果。

Note that the master also does not participate in the reduction operation itself, e.g. the value of sendbuf (&send in this case) is irrelevant as the root would not be sending data to be reduced - it merely collects the result of the reduction performed over the values from the processes in the remote group.

这篇关于MPI_Comm_spawn和MPI_Reduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆