使用 OpenMP 按列和行并行化矩阵乘以向量 [英] Parallelizing matrix times a vector by columns and by rows with OpenMP

查看:33
本文介绍了使用 OpenMP 按列和行并行化矩阵乘以向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我的一些作业,我需要实现矩阵与向量的乘法,并按行和列对其进行并行化.我确实理解行版本,但我对列版本有点困惑.

For some homework I have, I need to implement the multiplication of a matrix by a vector, parallelizing it by rows and by columns. I do understand the row version, but I am a little confused in the column version.

假设我们有以下数据:

以及行版本的代码:

#pragma omp parallel default(none) shared(i,v2,v1,matrix,tam) private(j)
  {
#pragma omp for
    for (i = 0; i < tam; i++)
      for (j = 0; j < tam; j++){
//        printf("Hebra %d hizo %d,%d
", omp_get_thread_num(), i, j);
        v2[i] += matrix[i][j] * v1[j];
      }
  }

这里计算正确,结果正确.

Here the calculations are done right and the result is correct.

列版本:

#pragma omp parallel default(none) shared(j,v2,v1,matrix,tam) private(i)
  {
    for (i = 0; i < tam; i++)
#pragma omp for
      for (j = 0; j < tam; j++) {
//            printf("Hebra %d hizo %d,%d
", omp_get_thread_num(), i, j);
        v2[i] += matrix[i][j] * v1[j];
      }
  }

这里,由于并行化是如何完成的,每次执行的结果都会有所不同,具体取决于谁执行每一列的线程.但它发生了一些有趣的事情,(我认为是因为编译器优化)如果我取消注释 printf 那么结果与行版本相同,因此,正确,例如:

Here, due to how the parallelization is done, the result varies on each execution depending on who thread execute each column. But it happens something interesting, (And I would think is because of compiler optimizations) if I uncomment the printf then the results all the same as the row version and therefore, correct, for example:

Thread 0 did 0,0
Thread 2 did 0,2
Thread 1 did 0,1
Thread 2 did 1,2
Thread 1 did 1,1
Thread 0 did 1,0
Thread 2 did 2,2
Thread 1 did 2,1
Thread 0 did 2,0

 2.000000  3.000000  4.000000 
 3.000000  4.000000  5.000000 
 4.000000  5.000000  6.000000 


V2:
20.000000, 26.000000, 32.000000,

是对的,但是如果我删除了 printf:

Is right, but If I remove the printf:

V2:
18.000000, 11.000000, 28.000000,

我应该使用什么样的机制来使列版本正确?

What kind of mechanism should I use to get the column version right?

注意:我更关心解释而不是您可能作为答案发布的代码,因为我真正想要的是了解列版本中出了什么问题.

Note: I care more about the explanation rather than the code you may post as answer, because what I really want is understand what is going wrong in the column version.

我找到了一种摆脱 Z 玻色子在他的回答中提出的私有向量的方法.我已经用一个变量替换了那个向量,这是代码:

I've found a way of get rid of the private vector proposed by Z boson in his answer. I've replaced that vector by a variable, here is the code:

    #pragma omp parallel
      {
        double sLocal = 0;
        int i, j;
        for (i = 0; i < tam; i++) {
    #pragma omp for
          for (j = 0; j < tam; j++) {
            sLocal += matrix[i][j] * v1[j];
          }
    #pragma omp critical
          {
            v2[i] += sLocal;
            sLocal = 0;
          }
        }
      }

推荐答案

我不知道你的作业沿着行和列并行化意味着什么,但我知道为什么你的代码不起作用.当您写入 v2[i] 时,您会遇到竞争条件.您可以通过制作 v2[i] 的私有版本来修复它,并行填充它们,然后将它们与临界区合并.

I don't know exactly what your homework means by parallelize along row and column but I know why your code is not working. You have a race condition when you write to v2[i]. You can fix it by making private versions of v2[i], filling them in parallel, and then merging them with a critical section.

#pragma omp parallel
{
    float v2_private[tam] = {};
    int i,j;
    for (i = 0; i < tam; i++) {
        #pragma omp for
        for (j = 0; j < tam; j++) {
            v2_private[i] += matrix[i][j] * v1[j];
        }
    }
    #pragma omp critical
    {
        for(i=0; i<tam; i++) v2[i] += v2_private[i];
    }
}

我对此进行了测试.您可以在此处查看结果 http://coliru.stacked-crooked.com/a/5ad4153f9579304d

I tested this. You can see the results here http://coliru.stacked-crooked.com/a/5ad4153f9579304d

请注意,我没有明确定义任何共享或私有的内容.没有必要这样做.有些人认为你应该明确定义一切.我个人认为相反.通过在并行部分中定义 ij(和 v2_private),它们被设为私有.

Note that I did not explicitly define anything shared or private. It's not necessary to do. Some people think you should explicitly define everything. I personalty think the opposite. By defining i and j (and v2_private) inside the parallel section they are made private.

这篇关于使用 OpenMP 按列和行并行化矩阵乘以向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆