优化,为什么openmp比顺序方式慢得多? [英] Optimising and why openmp is much slower than sequential way?
问题描述
我是OpenMp编程的新手.我编写了一个简单的C程序,将矩阵与向量相乘.不幸的是,通过比较执行时间,我发现OpenMP比顺序方式要慢得多.
I am a newbie in programming with OpenMp. I wrote a simple c program to multiply matrix with a vector. Unfortunately, by comparing executing time I found that the OpenMP is much slower than the Sequential way.
这是我的代码(这里的矩阵是N * N int,向量是N int,结果是N long long):
Here is my code (Here the matrix is N*N int, vector is N int, result is N long long):
#pragma omp parallel for private(i,j) shared(matrix,vector,result,m_size)
for(i=0;i<m_size;i++)
{
for(j=0;j<m_size;j++)
{
result[i]+=matrix[i][j]*vector[j];
}
}
这是顺序方式的代码:
for (i=0;i<m_size;i++)
for(j=0;j<m_size;j++)
result[i] += matrix[i][j] * vector[j];
当我尝试使用999x999矩阵和999向量的这两种实现时,执行时间为:
When I tried these two implementations with a 999x999 matrix and a 999 vector, the execution time is:
顺序:5439毫秒 并行:11120毫秒
Sequential: 5439 ms Parallel: 11120 ms
我真的不明白为什么OpenMP比顺序算法要慢得多(慢2倍!)有人可以解决我的问题吗?
I really cannot understand why OpenMP is much slower than sequential algo (over 2 times slower!) Anyone who can solve my problem?
推荐答案
因为当OpenMP在线程之间分配工作时,为了确保 shared 矩阵中的值,正在进行大量的管理/同步和vector不会以某种方式损坏.即使它们是只读的:人类也很容易看到,但是编译器可能不会.
Because when OpenMP distributes the work among threads there is a lot of administration/synchronisation going on to ensure the values in your shared matrix and vector are not corrupted somehow. Even though they are read-only: humans see that easily, your compiler may not.
出于教学原因需要尝试的事情:
Things to try out for pedagogic reasons:
0)如果matrix
和vector
不是shared
会发生什么?
0) What happens if matrix
and vector
are not shared
?
1)首先并行化内部"j循环",保持外部"i循环"串行.看看会发生什么.
1) Parallelize the inner "j-loop" first, keep the outer "i-loop" serial. See what happens.
2)不要在result[i]
中而是在变量temp
中收集总和,并且仅在完成内部循环之后才将其内容分配给result[i]
,以避免重复查找索引.不要忘记在内部循环开始之前将temp
初始化为0.
2) Do not collect the sum in result[i]
, but in a variable temp
and assign its contents to result[i]
only after the inner loop is finished to avoid repeated index lookups. Don't forget to init temp
to 0 before the inner loop starts.
这篇关于优化,为什么openmp比顺序方式慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!