按列和使用OpenMP并行行矩阵次矢量 [英] Parallelizing matrix times a vector by columns and by rows with OpenMP

查看：112 发布时间：2016/8/19 15:23:37 c parallel-processing openmp

本文介绍了按列和使用OpenMP并行行矩阵次矢量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于一些功课我有，我需要通过一个载体来实现矩阵乘法，按行和按列并行的。我明白了行版本，但我在列版本有点困惑。

假设我们有如下数据：

而code行版本：

 的#pragma共享OMP并行默认值（无）（I，V2，V1，矩阵，TAM）私人（J）
  {
OMP的#pragma为
    对于（i = 0; I＆LT; TAM;我++）
      为（J = 0; J＆LT; TAM; J ++）{
//输出（Hebra％d个hizo％D，％d个\\ N，omp_get_thread_num（），I，J）;
        V2 [I] + =矩阵[i] [j]的* V1 [J]。
      }
  }

下面的计算完成权，结果是正确的。

列版本：

 的#pragma共享OMP并行默认值（无）（J，V2，V1，矩阵，TAM）私人（I）
  {
    对于（i = 0; I＆LT; TAM;我++）
OMP的#pragma为
      为（J = 0; J＆LT; TAM; J ++）{
//输出（Hebra％d个hizo％D，％d个\\ N，omp_get_thread_num（），I，J）;
        V2 [I] + =矩阵[i] [j]的* V1 [J]。
      }
  }

下面，由于并行化是如何完成的，其结果就取决于谁线程执行每个列中的每个执行不同而不同。但它发生一些有趣的事情，（我会认为是因为编译器优化），如果我取消注释的printf 然后把结果都一样行版本，因此，正确的，例如：

  0螺纹做0,0
线程2做0,2
线程1做了0,1
线程2做1,2
线程1做了1,1
线程0做1,0
线程2做2,2
线程1做了2,1
线程0做2,0 2.000000 3.000000 4.000000
 3.000000 4.000000 5.000000
 4.000000 5.000000 6.000000
V2：
20.000000，26.000000，32.000000，

是正确的，但如果我删除的printf：

  V2：
18.000000，11.000000，28.000000，

我应该用什么样的机制来得到列的版本吧？

注意：我更在乎的解释，而不是code，你可以张贴作为回答，因为我真正想要的是明白是什么在列版本脚麻

修改

我发现的摆脱他的回答由Z玻色子提出的私人载体的一种方式。我换成一个变量向量，这里是code：

 的#pragma OMP并行
      {
        双sLocal = 0;
        INT I，J;
        对于（i = 0; I＆LT; TAM;我++）{
    OMP的#pragma为
          为（J = 0; J＆LT; TAM; J ++）{
            sLocal + =矩阵[i] [j]的* V1 [J]。
          }
    OMP的#pragma关键
          {
            V2 [I] + = sLocal;
            sLocal = 0;
          }
        }
      }

解决方案

我不知道你的家庭作业是指什么沿行和列并行，但我知道为什么你的code不工作。当你写 V2 [I] 你有一个竞争状态。您可以通过专用版本的V2 [I] ，并行填充它们，然后用一个关键部分合并来解决这个问题。

 的#pragma OMP并行
{
    浮v2_private [TAM] = {};
    INT I，J;
    对于（i = 0; I＆LT; TAM;我++）{
        OMP的#pragma为
        为（J = 0; J＆LT; TAM; J ++）{
            v2_private [I] + =矩阵[i] [j]的* V1 [J]。
        }
    }
    OMP的#pragma关键
    {
        对于（i = 0; I＆LT; TAM;我++）V2 [I] + = v2_private [I]
    }
}

我测试。你可以看到这里的结果 http://coliru.stacked-crooked.com/a/5ad4153f9579304d

请注意，我并没有明确定义什么共用或私人。这是没有必要的事情。有些人认为，你应该明确地定义了一切。我认为动产相反。通过定义 I 和Ĵ（和 v2_private ）内并行段，他们是由私有的。

For some homework I have, I need to implement the multiplication of a matrix by a vector, parallelizing it by rows and by columns. I do understand the row version, but I am a little confused in the column version.

Lets say we have the following data:

And the code for the row version:

#pragma omp parallel default(none) shared(i,v2,v1,matrix,tam) private(j)
  {
#pragma omp for
    for (i = 0; i < tam; i++)
      for (j = 0; j < tam; j++){
//        printf("Hebra %d hizo %d,%d\n", omp_get_thread_num(), i, j);
        v2[i] += matrix[i][j] * v1[j];
      }
  }

Here the calculations are done right and the result is correct.

The column version:

#pragma omp parallel default(none) shared(j,v2,v1,matrix,tam) private(i)
  {
    for (i = 0; i < tam; i++)
#pragma omp for
      for (j = 0; j < tam; j++) {
//            printf("Hebra %d hizo %d,%d\n", omp_get_thread_num(), i, j);
        v2[i] += matrix[i][j] * v1[j];
      }
  }

Here, due to how the parallelization is done, the result varies on each execution depending on who thread execute each column. But it happens something interesting, (And I would think is because of compiler optimizations) if I uncomment the printf then the results all the same as the row version and therefore, correct, for example:

Thread 0 did 0,0
Thread 2 did 0,2
Thread 1 did 0,1
Thread 2 did 1,2
Thread 1 did 1,1
Thread 0 did 1,0
Thread 2 did 2,2
Thread 1 did 2,1
Thread 0 did 2,0

 2.000000  3.000000  4.000000 
 3.000000  4.000000  5.000000 
 4.000000  5.000000  6.000000 


V2:
20.000000, 26.000000, 32.000000,

Is right, but If I remove the printf:

V2:
18.000000, 11.000000, 28.000000,

What kind of mechanism should I use to get the column version right?

Note: I care more about the explanation rather than the code you may post as answer, because what I really want is understand what is going wrong in the column version.

EDIT

I've found a way of get rid of the private vector proposed by Z boson in his answer. I've replaced that vector by a variable, here is the code:

    #pragma omp parallel
      {
        double sLocal = 0;
        int i, j;
        for (i = 0; i < tam; i++) {
    #pragma omp for
          for (j = 0; j < tam; j++) {
            sLocal += matrix[i][j] * v1[j];
          }
    #pragma omp critical
          {
            v2[i] += sLocal;
            sLocal = 0;
          }
        }
      }

解决方案

I don't know exactly what your homework means by parallelize along row and column but I know why your code is not working. You have a race condition when you write to v2[i]. You can fix it by making private versions of v2[i], filling them in parallel, and then merging them with a critical section.

#pragma omp parallel
{
    float v2_private[tam] = {};
    int i,j;
    for (i = 0; i < tam; i++) {
        #pragma omp for
        for (j = 0; j < tam; j++) {
            v2_private[i] += matrix[i][j] * v1[j];
        }
    }
    #pragma omp critical
    {
        for(i=0; i<tam; i++) v2[i] += v2_private[i];
    }
}

I tested this. You can see the results here http://coliru.stacked-crooked.com/a/5ad4153f9579304d

Note that I did not explicitly define anything shared or private. It's not necessary to do. Some people think you should explicitly define everything. I personalty think the opposite. By defining i and j (and v2_private) inside the parallel section they are made private.

这篇关于按列和使用OpenMP并行行矩阵次矢量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

按列和使用OpenMP并行行矩阵次矢量 [英] Parallelizing matrix times a vector by columns and by rows with OpenMP

问题描述

修改

EDIT

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

按列和使用OpenMP并行行矩阵次矢量 [英] Parallelizing matrix times a vector by columns and by rows with OpenMP

问题描述

修改

EDIT

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭