具有散点图收集的MPI矩阵乘法 [英] MPI Matrix Multiplication with scatter gather

查看:118
本文介绍了具有散点图收集的MPI矩阵乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用C中的MPI进行矩阵乘法,并且我们必须执行一个顺序版本和一个并行版本.我的并行版本没有给出正确的答案,我不确定为什么.我认为我没有向流程发送正确的通信,但是我不确定.教授只是浏览了不同的发送/接收/收集等消息,但并没有真正涉及太多细节...我已经看到了很多不同的示例,但是都没有完成,也没有使用分散/收集.如果有人可以看一下我的代码,并告诉我是否有任何弹出的代码,我将不胜感激.我很确定我的问题出在散布/聚集消息或c矩阵的实际计算中.

I'm trying to do matrix multiplication using MPI in C and we have to do a version that sequential and one parallel version. My parallel version is not giving the correct answers and I'm not sure why. I think I'm not sending the right communications to the processes but I can't be sure. The professor just went over the different send/receive/gather etc messages, but didn't really get into much detail... I've seen a lot of different examples but none complete and none using scatter/gather. If anyone can take a look at my code and tell me if anything pops out at them I'd appreciate it. I'm pretty sure my problem is in the scatter/gather messages or the actual calculation of the c matrix.

#define N 512

#include <stdio.h>
#include <math.h>
#include <sys/time.h>
#include <stdlib.h>
#include <stddef.h>
#include "mpi.h"


print_results(char *prompt, float a[N][N]);

int main(int argc, char *argv[])
{
    int i, j, k, rank, size, tag = 99, blksz, sum = 0;
    float a[N][N], b[N][N], c[N][N];
    char *usage = "Usage: %s file\n";
    FILE *fd;
    double elapsed_time, start_time, end_time;
    struct timeval tv1, tv2;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    if (argc < 2) {
            fprintf (stderr, usage, argv[0]);
            return -1;
    }

    if ((fd = fopen (argv[1], "r")) == NULL) {
            fprintf (stderr, "%s: Cannot open file %s for reading.\n",
                                    argv[0], argv[1]);
            fprintf (stderr, usage, argv[0]);
            return -1;
    }

    for (i = 0; i < N; i++)
            for (j = 0; j < N; j++)
                    fscanf (fd, "%f", &a[i][j]);

    for (i = 0; i < N; i++)
            for (j = 0; j < N; j++)
                    fscanf (fd, "%f", &b[i][j]);

    MPI_Barrier(MPI_COMM_WORLD);

    gettimeofday(&tv1, NULL);

    MPI_Scatter(a, N*N/size, MPI_INT, a, N*N/size, MPI_INT, 0,
                                                    MPI_COMM_WORLD);
    MPI_Bcast(b, N*N, MPI_INT, 0, MPI_COMM_WORLD);


    if (rank != 0) {
            for (i = 0; i < N; i++)
            {
                    for (j = 0; j < N; j++)
                    {
                            for (k = 0; k < N; k++)
                            {
                                    sum = sum + a[i][k] * b[k][j];
                            }
                            c[i][j] = sum;
                            sum = 0;
                    }
            }
    }

    MPI_Gather(c, N*N/size, MPI_INT, c, N*N/size, MPI_INT, 0,
                                                     MPI_COMM_WORLD);
    MPI_Finalize();

    gettimeofday(&tv2, NULL);

    elapsed_time = (tv2.tv_sec - tv1.tv_sec) + ((tv2.tv_usec - tv1.tv_usec)/1000000.0);

    printf ("elapsed_time=\t%lf (seconds)\n", elapsed_time);

    print_results("C = ", c);
}

print_results(char *prompt, float a[N][N])
{
    int i, j;

    printf ("\n\n%s\n", prompt);
    for (i = 0; i < N; i++) {
            for (j = 0; j < N; j++) {
                    printf(" %.2f", a[i][j]);
            }
            printf ("\n");
    }
    printf ("\n\n");
}

代码的更新部分:

for (i=0;i<size; i++)
            {
                    if (rank == i)
                    {
                            for (i = rank*(N/size); i < (rank*(N/size)+(N/size)); i++)
                            {
                                    for (j = rank*(N/size); j < (rank*(N/size)+(N/size)); j++)
                                    {
                                            for (k = rank*N; k < rank*N+N; k++)
                                            {
                                                    sum = sum + a[i][k] * b[k][j];
                                            }
                                            c[i][j] = sum;
                                            sum = 0;
                                    }
                            }
                    }
            }

推荐答案

代码中的第一个问题是size可能不会划分N.这意味着长度为N*N/size的散射size数据包不一定发送整个矩阵.这可能是最难解决的问题.

A first problem in your code is that size might not divide N. Which means that scattering size packets of length N*N/size does not necessarily send the whole matrix. This is probably the hardest point to get right.

正如Greg Inozemtsev指出的那样,第二个问题是您将进程0排除在计算之外,尽管它负责矩阵的一部分.

As Greg Inozemtsev points out, a second problem is that you exclude process 0 from the computation, although it is responsible for a part of the matrix.

还有一个问题是,所有I/O操作(从开头读取系数并在结尾输出结果)只能由进程0完成.

And yet another problem is that all I/O operations (reading the coefficients at the beginning and outputting the results at the end) should be done only by process 0.

另一方面,您应该在前向声明和定义中指定print_result函数的返回类型(在这种情况下为void).

On another note, you should specify the return type (void in this case) of your print_result function, both in the forward declaration and in the definition.

这篇关于具有散点图收集的MPI矩阵乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆