优化,为什么openmp比顺序方式慢得多? [英] Optimising and why openmp is much slower than sequential way?

查看:466
本文介绍了优化,为什么openmp比顺序方式慢得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是OpenMp编程的新手.我编写了一个简单的C程序,将矩阵与向量相乘.不幸的是,通过比较执行时间,我发现OpenMP比顺序方式要慢得多.

I am a newbie in programming with OpenMp. I wrote a simple c program to multiply matrix with a vector. Unfortunately, by comparing executing time I found that the OpenMP is much slower than the Sequential way.

这是我的代码(这里的矩阵是N * N int,向量是N int,结果是N long long):

Here is my code (Here the matrix is N*N int, vector is N int, result is N long long):

#pragma omp parallel for private(i,j) shared(matrix,vector,result,m_size)
for(i=0;i<m_size;i++)
{  
  for(j=0;j<m_size;j++)
  {  
    result[i]+=matrix[i][j]*vector[j];
  }
}

这是顺序方式的代码:

for (i=0;i<m_size;i++)
        for(j=0;j<m_size;j++)
            result[i] += matrix[i][j] * vector[j];

当我尝试使用999x999矩阵和999向量的这两种实现时,执行时间为:

When I tried these two implementations with a 999x999 matrix and a 999 vector, the execution time is:

顺序:5439毫秒 并行:11120毫秒

Sequential: 5439 ms Parallel: 11120 ms

我真的不明白为什么OpenMP比顺序算法要慢得多(慢2倍!)有人可以解决我的问题吗?

I really cannot understand why OpenMP is much slower than sequential algo (over 2 times slower!) Anyone who can solve my problem?

推荐答案

因为当OpenMP在线程之间分配工作时,为了确保 shared 矩阵中的值,正在进行大量的管理/同步和vector不会以某种方式损坏.即使它们是只读的:人类也很容易看到,但是编译器可能不会.

Because when OpenMP distributes the work among threads there is a lot of administration/synchronisation going on to ensure the values in your shared matrix and vector are not corrupted somehow. Even though they are read-only: humans see that easily, your compiler may not.

出于教学原因需要尝试的事情:

Things to try out for pedagogic reasons:

0)如果matrixvector不是shared会发生什么?

0) What happens if matrix and vector are not shared?

1)首先并行化内部"j循环",保持外部"i循环"串行.看看会发生什么.

1) Parallelize the inner "j-loop" first, keep the outer "i-loop" serial. See what happens.

2)不要在result[i]中而是在变量temp中收集总和,并且仅在完成内部循环之后才将其内容分配给result[i],以避免重复查找索引.不要忘记在内部循环开始之前将temp初始化为0.

2) Do not collect the sum in result[i], but in a variable temp and assign its contents to result[i] only after the inner loop is finished to avoid repeated index lookups. Don't forget to init temp to 0 before the inner loop starts.

这篇关于优化,为什么openmp比顺序方式慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆