使用 OpenMP 按列和行并行化矩阵乘以向量 [英] Parallelizing matrix times a vector by columns and by rows with OpenMP
问题描述
对于我的一些作业,我需要实现矩阵与向量的乘法,并按行和列对其进行并行化.我确实理解行版本,但我对列版本有点困惑.
For some homework I have, I need to implement the multiplication of a matrix by a vector, parallelizing it by rows and by columns. I do understand the row version, but I am a little confused in the column version.
假设我们有以下数据:
以及行版本的代码:
#pragma omp parallel default(none) shared(i,v2,v1,matrix,tam) private(j)
{
#pragma omp for
for (i = 0; i < tam; i++)
for (j = 0; j < tam; j++){
// printf("Hebra %d hizo %d,%d
", omp_get_thread_num(), i, j);
v2[i] += matrix[i][j] * v1[j];
}
}
这里计算正确,结果正确.
Here the calculations are done right and the result is correct.
列版本:
#pragma omp parallel default(none) shared(j,v2,v1,matrix,tam) private(i)
{
for (i = 0; i < tam; i++)
#pragma omp for
for (j = 0; j < tam; j++) {
// printf("Hebra %d hizo %d,%d
", omp_get_thread_num(), i, j);
v2[i] += matrix[i][j] * v1[j];
}
}
这里,由于并行化是如何完成的,每次执行的结果都会有所不同,具体取决于谁执行每一列的线程.但它发生了一些有趣的事情,(我认为是因为编译器优化)如果我取消注释 printf
那么结果与行版本相同,因此,正确,例如:
Here, due to how the parallelization is done, the result varies on each execution depending on who thread execute each column. But it happens something interesting, (And I would think is because of compiler optimizations) if I uncomment the printf
then the results all the same as the row version and therefore, correct, for example:
Thread 0 did 0,0
Thread 2 did 0,2
Thread 1 did 0,1
Thread 2 did 1,2
Thread 1 did 1,1
Thread 0 did 1,0
Thread 2 did 2,2
Thread 1 did 2,1
Thread 0 did 2,0
2.000000 3.000000 4.000000
3.000000 4.000000 5.000000
4.000000 5.000000 6.000000
V2:
20.000000, 26.000000, 32.000000,
是对的,但是如果我删除了 printf:
Is right, but If I remove the printf:
V2:
18.000000, 11.000000, 28.000000,
我应该使用什么样的机制来使列版本正确?
What kind of mechanism should I use to get the column version right?
注意:我更关心解释而不是您可能作为答案发布的代码,因为我真正想要的是了解列版本中出了什么问题.
Note: I care more about the explanation rather than the code you may post as answer, because what I really want is understand what is going wrong in the column version.
我找到了一种摆脱 Z 玻色子在他的回答中提出的私有向量的方法.我已经用一个变量替换了那个向量,这是代码:
I've found a way of get rid of the private vector proposed by Z boson in his answer. I've replaced that vector by a variable, here is the code:
#pragma omp parallel
{
double sLocal = 0;
int i, j;
for (i = 0; i < tam; i++) {
#pragma omp for
for (j = 0; j < tam; j++) {
sLocal += matrix[i][j] * v1[j];
}
#pragma omp critical
{
v2[i] += sLocal;
sLocal = 0;
}
}
}
推荐答案
我不知道你的作业沿着行和列并行化意味着什么,但我知道为什么你的代码不起作用.当您写入 v2[i]
时,您会遇到竞争条件.您可以通过制作 v2[i]
的私有版本来修复它,并行填充它们,然后将它们与临界区合并.
I don't know exactly what your homework means by parallelize along row and column but I know why your code is not working. You have a race condition when you write to v2[i]
. You can fix it by making private versions of v2[i]
, filling them in parallel, and then merging them with a critical section.
#pragma omp parallel
{
float v2_private[tam] = {};
int i,j;
for (i = 0; i < tam; i++) {
#pragma omp for
for (j = 0; j < tam; j++) {
v2_private[i] += matrix[i][j] * v1[j];
}
}
#pragma omp critical
{
for(i=0; i<tam; i++) v2[i] += v2_private[i];
}
}
我对此进行了测试.您可以在此处查看结果 http://coliru.stacked-crooked.com/a/5ad4153f9579304d一个>
I tested this. You can see the results here http://coliru.stacked-crooked.com/a/5ad4153f9579304d
请注意,我没有明确定义任何共享或私有的内容.没有必要这样做.有些人认为你应该明确定义一切.我个人认为相反.通过在并行部分中定义 i
和 j
(和 v2_private
),它们被设为私有.
Note that I did not explicitly define anything shared or private. It's not necessary to do. Some people think you should explicitly define everything. I personalty think the opposite. By defining i
and j
(and v2_private
) inside the parallel section they are made private.
这篇关于使用 OpenMP 按列和行并行化矩阵乘以向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!