为什么OMP版本比串行版本慢? [英] Why omp version is slower than serial?

查看：101 发布时间：2020/5/21 1:33:03 c++ concurrency openmp

本文介绍了为什么OMP版本比串行版本慢?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是对此人的后续问题
现在我有了代码:

It's a follow-up question to this one
Now I have the code:

#include <iostream>
#include <cmath>
#include <omp.h>


#define max(a, b) (a)>(b)?(a):(b)

const int m = 2001;
const int n = 2000;
const int p = 4;

double v[m + 2][m + 2];
double x[m + 2];
double y[m + 2];
double _new[m + 2][m + 2];
double maxdiffA[p + 1];
int icol, jrow;

int main() {
    omp_set_num_threads(p);

    double h = 1.0 / (n + 1);

    double start = omp_get_wtime();

    #pragma omp parallel for private(icol) shared(x, y, v, _new)
    for (icol = 0; icol <= n + 1; ++icol) {
        x[icol] = y[icol] = icol * h;

        _new[icol][0] = v[icol][0] = 6 - 2 * x[icol];

        _new[n + 1][icol] = v[n + 1][icol] = 4 - 2 * y[icol];

        _new[icol][n + 1] = v[icol][n + 1] = 3 - x[icol];

        _new[0][icol] = v[0][icol] = 6 - 3 * y[icol];
    }


    const double eps = 0.01;


    #pragma omp parallel private(icol, jrow) shared(_new, v, maxdiffA)
    {
        while (true) { //for [iters=1 to maxiters by 2]
            #pragma omp single
            for (int i = 0; i < p; i++) maxdiffA[i] = 0;
            #pragma omp for
            for (icol = 1; icol <= n; icol++)
                for (jrow = 1; jrow <= n; jrow++)
                    _new[icol][jrow] =
                            (v[icol - 1][jrow] + v[icol + 1][jrow] + v[icol][jrow - 1] + v[icol][jrow + 1]) / 4;
            #pragma omp for
            for (icol = 1; icol <= n; icol++)
                for (jrow = 1; jrow <= n; jrow++)
                    v[icol][jrow] = (_new[icol - 1][jrow] + _new[icol + 1][jrow] + _new[icol][jrow - 1] +
                                     _new[icol][jrow + 1]) / 4;

            #pragma omp for
            for (icol = 1; icol <= n; icol++)
                for (jrow = 1; jrow <= n; jrow++)
                    maxdiffA[omp_get_thread_num()] = max(maxdiffA[omp_get_thread_num()],
                                                         fabs(_new[icol][jrow] - v[icol][jrow]));

            #pragma omp barrier

            double maxdiff = 0.0;
            for (int k = 0; k < p; ++k) {
                maxdiff = max(maxdiff, maxdiffA[k]);
            }


            if (maxdiff < eps)
                break;
            #pragma omp barrier
            //#pragma omp single
            //std::cout << maxdiff << std::endl;
        }
    }
    double end = omp_get_wtime();
    printf("start = %.16lf\nend = %.16lf\ndiff = %.16lf\n", start, end, end - start);

    return 0;
}

但是为什么它的工作速度比串行模拟慢2-3倍(32秒和18秒):

But why it works 2-3 times slower (32sec vs 18sec) than serial analog:

#include <iostream>
#include <cmath>
#include <omp.h>

#define max(a,b) (a)>(b)?(a):(b)

const int m = 2001;
const int n = 2000;
double v[m + 2][m + 2];
double x[m + 2];
double y[m + 2];
double _new[m + 2][m + 2];

int main() {
    double h = 1.0 / (n + 1);

    double start = omp_get_wtime();

    for (int i = 0; i <= n + 1; ++i) {
        x[i] = y[i] = i * h;

        _new[i][0]=v[i][0] = 6 - 2 * x[i];

        _new[n + 1][i]=v[n + 1][i] = 4 - 2 * y[i];

        _new[i][n + 1]=v[i][n + 1] = 3 - x[i];

        _new[0][i]=v[0][i] = 6 - 3 * y[i];
    }

    const double eps=0.01;
    while(true){ //for [iters=1 to maxiters by 2]
        double maxdiff=0.0;
        for (int i=1;i<=n;i++)
            for (int j=1;j<=n;j++)
                _new[i][j]=(v[i-1][j]+v[i+1][j]+v[i][j-1]+v[i][j+1])/4;
        for (int i=1;i<=n;i++)
            for (int j=1;j<=n;j++)
                v[i][j]=(_new[i-1][j]+_new[i+1][j]+_new[i][j-1]+_new[i][j+1])/4;

        for (int i=1;i<=n;i++)
            for (int j=1;j<=n;j++)
                maxdiff=max(maxdiff, fabs(_new[i][j]-v[i][j]));

        if(maxdiff<eps) break;
        std::cout << maxdiff<<std::endl;
    }

    double end = omp_get_wtime();
    printf("start = %.16lf\nend = %.16lf\ndiff = %.16lf\n", start, end, end - start);

    return 0;
}

同样有趣的是，它看起来像这样

Also interesting that it works SAME TIME as version (I can post it here if you say so) which looks like so

while(true){ //106 iteratins here!!!
#pragma omp paralell for
for(...)
#pragma omp paralell for
for(...)
#pragma omp paralell for
for(...)
}

但是我认为让omp代码变慢的原因是在while循环内生成线程106次...但是没有！然后，线程可能同时写入相同的数组单元.但是，它在哪里发生?我看不到，你能告诉我吗?
也许是因为障碍太多?但是讲师告诉我要像这样实现代码并分析"，也许答案是"Jacobi算法并不意味着并行运行良好"?还是我la脚的编码?

But I thought that what making omp code slow is spawning threads inside while loop 106 times... But no! Then probably threads simultaneously write to the same array cells.. But where does it happen? I don't see it could you show me please?
Maybe it's because too much barriers? But Lecturer told me to implement the code like so and "analyse it" Maybe the answer is "Jacobi algorithm isn't meant to run well in parallel"? Or it's just my lame coding?

为什么OMP版本比串行版本慢? [英] Why omp version is slower than serial?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

为什么OMP版本比串行版本慢? [英] Why omp version is slower than serial?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭