MPI从属进程挂起,当没有更多的工作 [英] MPI Slave processes hang when there is no more work

查看:388
本文介绍了MPI从属进程挂起,当没有更多的工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个串行C ++程序,我希望并行化。我知道MPI的基础知识, MPI_Send MPI_Recv 等。基本上,我有一个数据生成算法运行明显比数据处理算法快。目前它们是串行运行的,但我想在根进程中运行数据生成,在从进程上完成数据处理,并从根发送一个消息到包含要处理的数据的从属。这样,每个从属处理一个数据集,然后等待其下一个数据集。

I have a serial C++ program that I wish to parallelize. I know the basics of MPI, MPI_Send, MPI_Recv, etc. Basically, I have a data generation algorithm that runs significantly faster than the data processing algorithm. Currently they run in series, but I was thinking that running the data generation in the root process, having the data processing done on the slave processes, and sending a message from the root to a slave containing the data to be processed. This way, each slave processes a data set and then waits for its next data set.

问题是,一旦根进程生成数据,程序就会挂起,因为从属程序正在等待更多。

The problem is that, once the root process is done generating data, the program hangs because the slaves are waiting for more.

这是一个问题的例子:

#include "mpi.h"

#include <cassert>
#include <cstdio>

class Generator {
  public:
    Generator(int min, int max) : value(min - 1), max(max) {}
    bool NextValue() {
      ++value;
      return value < max;
    }
    int Value() { return value; }
  private:
    int value, max;

    Generator() {}
    Generator(const Generator &other) {}
    Generator &operator=(const Generator &other) { return *this; }
};

long fibonnaci(int n) {
  assert(n > 0);
  if (n == 1 || n == 2) return 1;
  return fibonnaci(n-1) + fibonnaci(n-2);
}

int main(int argc, char **argv) {
  MPI_Init(&argc, &argv);

  int rank, num_procs;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &num_procs);

  if (rank == 0) {
    Generator generator(1, 2 * num_procs);
    int proc = 1;
    while (generator.NextValue()) {
      int value = generator.Value();
      MPI_Send(&value, 1, MPI_INT, proc, 73, MPI_COMM_WORLD);
      printf("** Sent %d to process %d.\n", value, proc);
      proc = proc % (num_procs - 1) + 1;
    }
  } else {
    while (true) {
      int value;
      MPI_Status status;
      MPI_Recv(&value, 1, MPI_INT, 0, 73, MPI_COMM_WORLD, &status);
      printf("** Received %d from process %d.\n", value, status.MPI_SOURCE);
      printf("Process %d computed %d.\n", rank, fibonnaci(2 * (value + 10)));
    }
  }

  MPI_Finalize();
  return 0;
}

显然,上面的内容不是好的做法点横过。

Obviously not everything above is "good practice", but it is sufficient to get the point across.

如果我从从进程中删除 while(true),则程序退出,已退出。我希望程序只在根进程完成其工作后退出,并且所有的从站都处理了已发送的所有内容。

If I remove the while(true) from the slave processes, then the program exits when each of the slaves have exited. I would like the program to exit only after the root process has done its job AND all of the slaves have processed everything that has been sent.

如果我知道有多少数据集合将被生成,我可以有许多进程运行,一切都会很好地退出,但这不是这里的情况。

If I knew how many data sets would be generated, I could have that many process running and everything would exit nicely, but that isn't the case here.

有任何建议吗? API中有什么可以做到这一点吗?这可以用更好的拓扑结构更好地解决? MPI_Isend MPI_IRecv 更好吗?

Any suggestions? Is there anything in the API that will do this? Could this be solved better with a better topology? Would MPI_Isend or MPI_IRecv do this better? I am fairly new to MPI so bear with me.

感谢

推荐答案

p>通常的做法是向所有工作进程发送一个带有特殊标记的空消息,通知它们退出无限处理循环。让我们说这个标签是42.你会在工作循环中做这样的事情:

The usual practice is to send to all worker processes an empty message with a special tag that signals them to exit the infinite processing loop. Let's say this tag is 42. You would do something like that in the worker loop:

while (true) {
  int value;
  MPI_Status status;
  MPI_Recv(&value, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
  if (status.MPI_TAG == 42) {
    printf("Process %d exiting work loop.\n", rank);
    break;
  }
  printf("** Received %d from process %d.\n", value, status.MPI_SOURCE);
  printf("Process %d computed %d.\n", rank, fibonnaci(2 * (value + 10)));
}

在生成器循环后,管理器进程会执行这样的操作:

The manager process would do something like this after the generator loop:

for (int i = 1; i < num_procs; i++)
  MPI_Send(&i, 0, MPI_INT, i, 42, MPI_COMM_WORLD);

关于您的下一个问题。在主进程中使用 MPI_Isend()将反序列化执行并提高性能。事实上,你发送的是非常小的消息,通常是内部缓冲(警告 - 实现依赖!),所以你的 MPI_Send()实际上是非阻塞的,你已经有非串行执行。 MPI_Isend()返回您需要处理的 MPI_Request 句柄。您可以等待它以 MPI_Wait() MPI_Waitall()完成,但您也可以调用 MPI_Request_free(),它会在操作结束时自动释放。这通常是在您想要异步发送许多消息,并且在发送将完成时不会关心,但这是一个坏的做法,但是由于大量未完成的请求可能消耗大量的宝贵的内存。对于工作进程 - 他们需要数据为了继续计算,所以使用 MPI_Irecv()是不必要的。

Regarding your next question. Using MPI_Isend() in the master process would deserialise the execution and increase the performance. The truth however is that you are sending very small messages and those are typically internally buffered (WARNING - implementation dependent!) so your MPI_Send() is actually non-blocking and you already have non-serial execution. MPI_Isend() returns an MPI_Request handle that you need to take care of later. You could either wait for it to finish with MPI_Wait() or MPI_Waitall() but you could also just call MPI_Request_free() on it and it will be automatically freed when the operation is over. This is usually done when you'd like to send many messages asynchronously and would not care on when the sends will be completed, but it's a bad practice nevertheless since having a large number of outstanding requests can consume lots of precious memory. As for the worker processes - they need the data in order to proceed with the computation so using MPI_Irecv() is not necessary.

欢迎来到MPI编程的奇妙世界!

Welcome to the wonderful world of MPI programming!

这篇关于MPI从属进程挂起,当没有更多的工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆