cout最慢处理器MPI [英] cout slowest processor MPI

查看:258
本文介绍了cout最慢处理器MPI的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用MPI编写程序。每个处理器执行一个for循环:

I am writing a program using MPI. Each processor executes a for loop:

int main(int argc, char** argv) {
  boost::mpi::environment env(argc, argv);

  for( int i=0; i<10; ++i ) {
    std::cout << "Index " << i << std::endl << std::flush;
  }
}

有办法让cout只发生在最后一个处理器命中索引i?或者标志,所以一行只在最后一个处理器上执行才能得到它?

Is there a way to make the cout only happen on the last processor to hit index i? Or flag so a line is only executed on the last processor to get to it?

推荐答案

它可能看起来很简单, ,你在这里问的是分布式内存模型,如MPI ...是非常复杂的。

It might look like trivial, but actually, what you ask here is extremely complex for distributed memory models such as MPI...

在一个共享内存环境中,例如OpenMP,通过定义一个共享计数器,由所有线程原子地递增,然后检查它的值是否对应于线程数。如果是这样,那么这意味着所有的线程都会传递点,而当前的线程是最后一个线程,所以他将负责打印。

In a shared memory environment, such as OpenMP for example, this would be trivially solved by defining a shared counter, incremented atomically by all threads, and checked afterwards to see if it's value corresponds to the number of threads. If so, then that would mean all threads passed the point and the current being the last one, he would take care of the printing.

在分布式环境中,更新这样的共享变量是非常复杂的,因为每个进程可以在远程机器上运行。为了仍然允许,MPI提出自MPI-2.0存储器窗口和单向通信。然而,即使这样,也不可能正确地实现原子计数器增量,同时也可靠地获得它的值。它只有使用MPI 3.0和引入 MPI_Fetch_and_op()函数才能成为可能。下面是一个实现示例:

In a distributed environment, defining and updating such a shared variable is very complex, since each process might run on a remote machine. To still allow for that, MPI proposes since MPI-2.0 memory windows and one-sided communications. However, even with that, it wasn't possible to properly implement an atomic counter increment while also reliably getting it's value. It is only with MPI 3.0 and the introduction of the MPI_Fetch_and_op() function that this became possible. Here is an example of implementation:

#include <mpi.h>
#include <iostream>

int main( int argc, char *argv[] ) {

    // initialisation and inquiring of rank and size
    MPI_Init( &argc, &argv);

    int rank, size;
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );
    MPI_Comm_size( MPI_COMM_WORLD, &size );

    // creation of the "shared" counter on process of rank 0
    int *addr = 0, winSz = 0;
    if ( rank == 0 ) {
        winSz = sizeof( int );
        MPI_Alloc_mem( winSz, MPI_INFO_NULL, &addr );
        *addr = 1; // initialised to 1 since MPI_Fetch_and_op returns value *before* increment
    }
    MPI_Win win;
    MPI_Win_create( addr, winSz, sizeof( int ), MPI_INFO_NULL, MPI_COMM_WORLD, &win );

    // atomic incrementation of the counter
    int counter, one = 1;
    MPI_Win_lock( MPI_LOCK_EXCLUSIVE, 0, 0, win );
    MPI_Fetch_and_op( &one, &counter, MPI_INT, 0, 0, MPI_SUM, win );
    MPI_Win_unlock( 0, win );

    // checking the value of the counter and printing by last in time process
    if ( counter == size ) {
        std::cout << "Process #" << rank << " did the last update" << std::endl;
    }

    // cleaning up
    MPI_Win_free( &win );
    if ( rank == 0 ) {
        MPI_Free_mem( addr );
    }
    MPI_Finalize();

    return 0;
}

正如你所看到的,对于这样一个琐碎的请求,这是相当冗长和复杂。此外,这需要MPI 3.0支持。

As you can see, this is quite lengthy and complex for such a trivial request. And moreover, this requires MPI 3.0 support.

不幸的是,Boost.MPI似乎是你的目标,只支持MPI 1.1的大多数功能。所以如果你真的想得到这个功能,你必须使用一些普通的MPI编程。

Unfortunately, Boost.MPI which seems to your target, only "supports the majority of functionality in MPI 1.1". So if you really want to get this functionality, you'll have to use some plain MPI programming.

这篇关于cout最慢处理器MPI的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆