使用 OpenMP 按列和行并行化矩阵乘以向量 [英] Parallelizing matrix times a vector by columns and by rows with OpenMP

查看：33 发布时间：2021/12/30 21:42:41 c parallel-processing openmp

本文介绍了使用 OpenMP 按列和行并行化矩阵乘以向量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于我的一些作业，我需要实现矩阵与向量的乘法，并按行和列对其进行并行化.我确实理解行版本，但我对列版本有点困惑.

For some homework I have, I need to implement the multiplication of a matrix by a vector, parallelizing it by rows and by columns. I do understand the row version, but I am a little confused in the column version.

假设我们有以下数据:

以及行版本的代码:

#pragma omp parallel default(none) shared(i,v2,v1,matrix,tam) private(j)
  {
#pragma omp for
    for (i = 0; i < tam; i++)
      for (j = 0; j < tam; j++){
//        printf("Hebra %d hizo %d,%d
", omp_get_thread_num(), i, j);
        v2[i] += matrix[i][j] * v1[j];
      }
  }

这里计算正确，结果正确.

Here the calculations are done right and the result is correct.

列版本:

#pragma omp parallel default(none) shared(j,v2,v1,matrix,tam) private(i)
  {
    for (i = 0; i < tam; i++)
#pragma omp for
      for (j = 0; j < tam; j++) {
//            printf("Hebra %d hizo %d,%d
", omp_get_thread_num(), i, j);
        v2[i] += matrix[i][j] * v1[j];
      }
  }

这里，由于并行化是如何完成的，每次执行的结果都会有所不同，具体取决于谁执行每一列的线程.但它发生了一些有趣的事情，(我认为是因为编译器优化)如果我取消注释 printf 那么结果与行版本相同，因此，正确，例如:

Here, due to how the parallelization is done, the result varies on each execution depending on who thread execute each column. But it happens something interesting, (And I would think is because of compiler optimizations) if I uncomment the printf then the results all the same as the row version and therefore, correct, for example:

Thread 0 did 0,0
Thread 2 did 0,2
Thread 1 did 0,1
Thread 2 did 1,2
Thread 1 did 1,1
Thread 0 did 1,0
Thread 2 did 2,2
Thread 1 did 2,1
Thread 0 did 2,0

 2.000000  3.000000  4.000000 
 3.000000  4.000000  5.000000 
 4.000000  5.000000  6.000000 


V2:
20.000000, 26.000000, 32.000000,

是对的，但是如果我删除了 printf:

Is right, but If I remove the printf:

V2:
18.000000, 11.000000, 28.000000,

我应该使用什么样的机制来使列版本正确?

What kind of mechanism should I use to get the column version right?

注意:我更关心解释而不是您可能作为答案发布的代码，因为我真正想要的是了解列版本中出了什么问题.

Note: I care more about the explanation rather than the code you may post as answer, because what I really want is understand what is going wrong in the column version.

我找到了一种摆脱 Z 玻色子在他的回答中提出的私有向量的方法.我已经用一个变量替换了那个向量，这是代码:

I've found a way of get rid of the private vector proposed by Z boson in his answer. I've replaced that vector by a variable, here is the code:

    #pragma omp parallel
      {
        double sLocal = 0;
        int i, j;
        for (i = 0; i < tam; i++) {
    #pragma omp for
          for (j = 0; j < tam; j++) {
            sLocal += matrix[i][j] * v1[j];
          }
    #pragma omp critical
          {
            v2[i] += sLocal;
            sLocal = 0;
          }
        }
      }

推荐答案

我不知道你的作业沿着行和列并行化意味着什么，但我知道为什么你的代码不起作用.当您写入 v2[i] 时，您会遇到竞争条件.您可以通过制作 v2[i] 的私有版本来修复它，并行填充它们，然后将它们与临界区合并.

I don't know exactly what your homework means by parallelize along row and column but I know why your code is not working. You have a race condition when you write to v2[i]. You can fix it by making private versions of v2[i], filling them in parallel, and then merging them with a critical section.

#pragma omp parallel
{
    float v2_private[tam] = {};
    int i,j;
    for (i = 0; i < tam; i++) {
        #pragma omp for
        for (j = 0; j < tam; j++) {
            v2_private[i] += matrix[i][j] * v1[j];
        }
    }
    #pragma omp critical
    {
        for(i=0; i<tam; i++) v2[i] += v2_private[i];
    }
}

我对此进行了测试.您可以在此处查看结果 http://coliru.stacked-crooked.com/a/5ad4153f9579304d

I tested this. You can see the results here http://coliru.stacked-crooked.com/a/5ad4153f9579304d

请注意，我没有明确定义任何共享或私有的内容.没有必要这样做.有些人认为你应该明确定义一切.我个人认为相反.通过在并行部分中定义 i 和 j(和 v2_private)，它们被设为私有.

Note that I did not explicitly define anything shared or private. It's not necessary to do. Some people think you should explicitly define everything. I personalty think the opposite. By defining i and j (and v2_private) inside the parallel section they are made private.

这篇关于使用 OpenMP 按列和行并行化矩阵乘以向量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 OpenMP 按列和行并行化矩阵乘以向量 [英] Parallelizing matrix times a vector by columns and by rows with OpenMP

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 OpenMP 按列和行并行化矩阵乘以向量 [英] Parallelizing matrix times a vector by columns and by rows with OpenMP

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭