顺序和并行版本会产生不同的结果 - 为什么? [英] Sequential and parallel versions give different results - Why?

查看:190
本文介绍了顺序和并行版本会产生不同的结果 - 为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个嵌套循环:(L和A是完全定义的输入)

  #pragma omp parallel for schedule )对于(i = k + 1; i  for(n = 0; n <= 1),共享(L,A)\ 
减少(+:dummy) k; n ++){
#pragma omp atomic
dummy + = L [i] [n] * L [k] [n];
L [i] [k] =(A [i] [k] - dummy)/ L [k] [k];
}
dummy = 0;
}

及其顺序版本:
$ b $对于(n = 0; n dummy),(i = k + 1; i ),b

  + = L [i] [n] * L [k] [n]; 
L [i] [k] =(A [i] [k] - dummy)/ L [k] [k];
}
dummy = 0;
}

他们都给出了不同的结果。并行版本比顺序版本慢得多。



什么可能导致问题?

编辑:



为了消除由atomic指令引起的问题,我修改了代码,如下所示:

private(i)
, {
double dummyy = 0;
for(n = 0; n dummyy + = L [i] [n] * L [k] [n];
L [i] [k] =(A [i] [k] - dummyy)/ L [k] [k];
}
}

但它也没有解决问题。结果仍然不同。

解决方案

结果的差异来自内部循环变量 n ,它在线程之间共享,因为它是在omp编译指示之外定义的。



澄清:
循环变量 n 应该在omp pragma中声明,因为它应该是线程特定的,例如 for(int n = 0; .....)


I have a nested loop: (L and A are fully defined inputs)

    #pragma omp parallel for schedule(guided) shared(L,A) \
    reduction(+:dummy)
    for (i=k+1;i<row;i++){
            for (n=0;n<k;n++){
                #pragma omp atomic
                dummy += L[i][n]*L[k][n];
                L[i][k] = (A[i][k] - dummy)/L[k][k];
            }
            dummy = 0;
    }

And its sequential version:

    for (i=k+1;i<row;i++){
            for (n=0;n<k;n++){
                dummy += L[i][n]*L[k][n];
                L[i][k] = (A[i][k] - dummy)/L[k][k];
            }
            dummy = 0;
    }

They both give different results. And parallel version is much slower than the sequential version.

What may cause the problem?

Edit:

To get rid of the problems caused by the atomic directive, I modified the code as follows:

#pragma omp parallel for schedule(guided) shared(L,A) \
    private(i)
    for (i=k+1;i<row;i++){
        double dummyy = 0;
        for (n=0;n<k;n++){
            dummyy += L[i][n]*L[k][n];
            L[i][k] = (A[i][k] - dummyy)/L[k][k];
        }
    }

But it also didn't work out the problem. Results are still different.

解决方案

The difference in results comes from the inner loop variable n, which is shared between threads, since it is defined outside of the omp pragma.

Clarified: The loop variable n should be declared inside the omp pragma, since it should be thread-specific, for example for (int n = 0;.....)

这篇关于顺序和并行版本会产生不同的结果 - 为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆