MPI帮助如何并行化我的代码 [英] MPI help on how to parallelize my code

查看:190
本文介绍了MPI帮助如何并行化我的代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是该领域的新手,并且需要有关如何并行化我的代码的帮助. 我有一个大型1D数组,实际上它描述了3D体积:21x21x21单精度值. 我有3台计算机要进行计算.对于所有单元格,对网格(卷)中的每个单元格执行的操作都是相同的.该程序将接收一些数据并对它们执行一些简单的算术,然后将返回值分配给网格单元.

I am very much a newbie in this subject and need help on how to parallelize my code. I have a large 1D array that in reality describes a 3D volume: 21x21x21 single precision values. I have 3 computers that I want to engage in the computation. The operation that is performed on each cell in the grid(volume) is identical for all cells. The program takes in some data and perform some simple arithmetics on them and the return value is assigned to the grid cell.

我的非参数化代码是:

float zg, yg, xg;
stack_result = new float[Nz*Ny*Nx];
// StrMtrx[8] is the vertical step size, StrMtrx[6]  is the vertical starting point 
for (int iz=0; iz<Nz; iz++) {
  zg = iz*StRMtrx[8]+StRMtrx[6];  // find the vertical position in meters
  // StrMtrx[5] is the crossline step size, StrMtrx[3]  is the crossline starting point
  for (int iy=0; iy<Ny; iy++) {
    yg = iy*StRMtrx[5]+StRMtrx[3];  // find the crossline position
    // StrMtrx[2] is the inline step size, StrMtrx[0]  is the inline starting point
    for (int ix=0; ix < nx; ix++) { 
      xg = ix*StRMtrx[2]+StRMtrx[0]; // find the inline position
      // do stacking on each grid cell
      // "Geoph" is the geophone ids, "Ngeo" is the number of geophones involved,
      // "pahse_use" is the wave type, "EnvMtrx" is the input data common to all
      // cells, "Mdata" is the length of input data
      stack_result[ix+Nx*iy+Nx*Ny*iz] =
        stack_for_qds(Geoph, Ngeo, phase_use, xg, yg, zg, EnvMtrx, Mdata);  
    }        
  }
}

现在,我要使用3台计算机,并将体积划分为3个垂直部分,因此,每个21x21x7单元将有3个子体积. (请注意,体积的解析是在z,y,x中进行的). 变量"stack_result"是完整的卷. 我的并行化版本(完全失败,我只得到其中一个子卷)是:

Now I take in 3 computers and divide the volume in 3 vertical segments, so I would then have 3 sub-volumes each 21x21x7 cells. (note the parsing of the volume is in z,y,x). The variable "stack_result" is the complete volume. My parallellized version (which utterly fails, I only get one of the sub-volumes back) is:

MPI_Status status;
int rank, numProcs, rootProcess;
ierr = MPI_Init(&argc, &argv);
ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank);
ierr = MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
int rowsInZ = Nz/numProcs;  // 7 cells in Z (vertical)
int chunkSize = Nx*Ny*rowsInZ;
float *stack_result = new float[Nz*Ny*Nx];
float zg, yg, xg;
rootProcess = 0;
if(rank == rootProcess) {
  offset = 0;
  for (int n = 1; n < numProcs; n++) { 
    // send rank
    MPI_Send(&n, 1, MPI_INT, n, 2, MPI_COMM_WORLD);
    // send the offset in array
    MPI_Send(&offset, 1, MPI_INT, n, 2, MPI_COMM_WORLD);
    // send volume, now only filled with zeros,
    MPI_Send(&stack_result[offset], chunkSize, MPI_FLOAT, n, 1, MPI_COMM_WORLD);
    offset = offset+chunkSize;
  }
  // receive results
  for (int n = 1; n < numProcs; n++) { 
    int source = n;
    MPI_Recv(&offset, 1, MPI_INT, source, 2, MPI_COMM_WORLD, &status);
    MPI_Recv(&stack_result[offset], chunkSize, MPI_FLOAT, source, 1, MPI_COMM_WORLD, &status);
  }
}  else {
  int rank;
  int source = 0;
  int ierr = MPI_Recv(&rank, 1, MPI_INT, source, 2, MPI_COMM_WORLD, &status);
  ierr = MPI_Recv(&offset, 1, MPI_INT, source, 2, MPI_COMM_WORLD, &status);
  ierr = MPI_Recv(&stack_result[offset], chunkSize, MPI_FLOAT, source, 1, MPI_COMM_WORLD, &status);       
  int nz = rowsInZ;  // sub-volume vertical length
  int startZ = (rank-1)*rowsInZ;
  for (int iz = startZ; iz < startZ+nz; iz++) {
    zg = iz*StRMtrx[8]+StRMtrx[6];
    for (int iy = 0; iy < Ny; iy++) {
      yg = iy*StRMtrx[5]+StRMtrx[3];
      for (int ix = 0; ix < Nx; ix++) {
        xg = ix*StRMtrx[2]+StRMtrx[0];
        stack_result[offset+ix+Nx*iy+Nx*Ny*iz]=
          stack_for_qds(Geoph, Ngeo, phase_use, xg, yg, zg, EnvMtrx, Mdata);
      }  // x-loop
    }  // y-loop
  }   // z-loop
  MPI_Send(&offset, 1, MPI_INT, source, 2, MPI_COMM_WORLD);
  MPI_Send(&stack_result[offset], chunkSize, MPI_FLOAT, source, 1, MPI_COMM_WORLD);
}  // else
write("stackresult.dat", stack_result);
delete [] stack_result;
MPI_Finalize();

提前感谢您的耐心等候.

Thanks in advance for your patience.

推荐答案

您正在所有MPI等级中调用write("stackresult.dat", stack_result);.结果,它们都写入并覆盖了相同的文件,并且您看到的是执行该代码语句的最后一个MPI进程写入的内容.您应该将写入内容移至if (rank == rootProcess)条件的主体中,以便只有根进程才能进行写入.

You are calling write("stackresult.dat", stack_result); in all MPI ranks. As a result, they all write into and thus overwrite the same file and what you see is the content written by the last MPI process to execute that code statement. You should move the writing into the body of the if (rank == rootProcess) conditional so that only the root process will write.

作为旁注,发送等级的值是多余的-MPI已经为每个进程分配了一个等级,范围从0#processes - 1.这也使偏移量的发送变得多余,因为每个MPI进程都可以根据其排名轻松地自己计算偏移量.

As a side note, sending the value of the rank is redundant - MPI already assigns each process a rank that ranges from 0 to #processes - 1. That also makes sending of the offset redundant since each MPI process could easily compute the offset on its own based on its rank.

这篇关于MPI帮助如何并行化我的代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆