MPI C ++矩阵添加,函数参数和函数返回 [英] MPI C++ matrix addition, function arguments, and function returns

查看:153
本文介绍了MPI C ++矩阵添加,函数参数和函数返回的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在过去的2年里,我一直在互联网上学习C ++,终于需要开发MPI。我一直在淘汰stackoverflow和互联网的其余部分(包括 http: //people.sc.fsu.edu/~jburkardt/cpp_src/mpi/mpi.html https://computing.llnl.gov/tutorials/mpi/#LLNL )。我想我有一些逻辑下来,但我很难包围我的头围下列:

I've been learning C++ from the internet for the past 2 years and finally the need has arisen for me to delve into MPI. I've been scouring stackoverflow and the rest of the internet (including http://people.sc.fsu.edu/~jburkardt/cpp_src/mpi/mpi.html and https://computing.llnl.gov/tutorials/mpi/#LLNL). I think I've got some of the logic down, but I'm having a hard time wrapping my head around the following:

#include (stuff)
using namespace std;

vector<double> function(vector<double> &foo, const vector<double> &bar, int dim, int rows);

int main(int argc, char** argv)
{
    vector<double> result;//represents a regular 1D vector
    int id_proc, tot_proc, root_proc = 0;
    int dim;//set to number of "columns" in A and B below
    int rows;//set to number of "rows" of A and B below
    vector<double> A(dim*rows), B(dim*rows);//represent matrices as 1D vectors

    MPI::Init(argc,argv);
    id_proc = MPI::COMM_WORLD.Get_rank();
    tot_proc = MPI::COMM_WORLD.Get_size();

    /*
    initialize A and B here on root_proc with RNG and Bcast to everyone else
    */

    //allow all processors to call function() so they can each work on a portion of A
    result = function(A,B,dim,rows);

    //all processors do stuff with A
    //root_proc does stuff with result (doesn't matter if other processors have updated result)

    MPI::Finalize();
    return 0;
}

vector<double> function(vector<double> &foo, const vector<double> &bar, int dim, int rows)
{
    /*
    purpose of function() is two-fold:
    1. update foo because all processors need the updated "matrix"
    2. get the average of the "rows" of foo and return that to main (only root processor needs this)
    */

    vector<double> output(dim,0);

    //add matrices the way I would normally do it in serial
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < dim; j++)
        {
            foo[i*dim + j] += bar[i*dim + j];//perform "matrix" addition (+= ON PURPOSE)
        }
    }

    //obtain average of rows in foo in serial
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < dim; j++)
        {
            output[j] += foo[i*dim + j];//sum rows of A
        }
    }

    for (int j = 0; j < dim; j++)
    {
            output[j] /= rows;//divide to obtain average
    }

    return output;        
}

上述代码仅用于说明概念。我主要关注的是并行化矩阵添加,但是我的想法是这样的:

The code above is to illustrate the concept only. My main concern is to parallelize the matrix addition but what boggles my mind is this:

1)如果每个处理器只工作在一部分循环必须修改每个处理器的循环参数)我使用什么命令将A的所有部分合并成所有处理器在其存储器中具有的单个更新的A.我的猜测是,我必须做一些类型的Alltoall,其中每个处理器发送它的部分A到所有其他处理器,但我如何保证(例如)行3处理器3工作覆盖其他处理器的行3,而不是意外的行1。

1) If each processor only works on a portion of that loop (naturally I'd have to modify the loop parameters per processor) what command do I use to merge all portions of A back into a single, updated A that all processors have in their memory. My guess is that I have to do some kind of Alltoall where each processor sends its portion of A to all other processors, but how do I guarantee that (for example) row 3 worked on by processor 3 overwrites row 3 of the other processors, and not row 1 by accident.

2)如果我使用Alltoall里面的函数(),所有的处理器都必须允许进入function我使用...隔离function()。

2) If I use an Alltoall inside function(), do all processors have to be allowed to step into function(), or can I isolate function() using...

if (id_proc == root_proc)
{
    result = function(A,B,dim,rows);
}

...然后里面的function()处理所有的并行化。听起来很愚蠢,我试图在一个处理器(广播)上做很多工作,并且只是并行化大的耗时的循环。只是想让代码在概念上简单,所以我可以得到我的结果,并继续。

… and then inside function() handle all the parallelization. As silly as it sounds, I'm trying to do a lot of the work on one processor (with broadcasts), and just parallelize the big time-consuming for loops. Just trying to keep the code conceptually simple so I can get my results and move on.

3)对于平均部分,我相信我可以只使用一个减少命令如果我想并行化,正确?

3) For the averaging part, I'm sure I can just use a reducing command if I wanted to parallelize it, correct?

另外,作为一个旁白:有一种方法来调用Bcast(),使其阻塞?我想使用它来同步我所有的处理器(boost库不是一个选项)。如果没有,我就去Barrier()。谢谢你的回答这个问题,和stackoverflow的社区学习我如何编程在过去两年! :)

Also, as an aside: is there a way to call Bcast() such that it is blocking? I'd like to use it to synchronize all my processors (boost libraries are not an option). If not then I'll just go with Barrier(). Thank you for your answer to this question, and to the community of stackoverflow for learning me how to program over the past two years! :)

推荐答案

1)您正在查找的函数是MPI_Allgather。 MPI_Allgather将让您从每个处理器发送一行,并在所有处理器上接收结果。

1) The function you are looking is MPI_Allgather. MPI_Allgather will let you send a row from each processor and receive the result on all processors.

2)是的,您可以使用函数中的一些处理器。由于MPI功能与通信器一起工作,因此您必须为此目的创建一个单独的通信器。我不知道这是如何实现在C ++绑定,但C绑定使用MPI_Comm_create函数。

2) Yes you can use some of the processors in your function. Since MPI functions work with communicators you have to create a separate communicator for this purpose. I don't know how this is implemented in the C++ bindings but C bindings use the MPI_Comm_create function.

3)是看MPI_Allreduce。

3) Yes see MPI_Allreduce.

aside:Bcast阻塞一个进程,直到分配给该进程的发送/接收操作完成。如果你想等待所有处理器完成他们的工作(我不知道你为什么要这样做),你应该使用Barrier()。

aside: Bcast blocks a process until send/receive operation assigned to that process is finished. If you want to wait for all processors to finish their work (I don't have any idea why you would want to do this) you should use Barrier().

额外注意:我不建议使用C ++绑定,因为他们贬值,你不会找到具体的例子,如何使用它们。 Boost MPI是要使用的库,如果你想要C ++绑定,但它不包括所有的MPI函数。

extra note: I wouldn't recommend using the C++ bindings as they are depreciated and you won't find specific examples on how to use them. Boost MPI is the library to use if you want C++ bindings however it does not cover all of MPI functions.

这篇关于MPI C ++矩阵添加,函数参数和函数返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆