英特尔编译器(C ++)在std :: vector上具有OpenMP减少问题 [英] Intel compiler (C++) issue with OpenMP reduction on std::vector

查看:202
本文介绍了英特尔编译器(C ++)在std :: vector上具有OpenMP减少问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从OpenMP 4.0开始,支持用户定义的还原.因此,我确切地从此处定义了C ++中std :: vector的减少量.它可以在GNU/5.4.0和GNU/6.4.0上正常工作,但是对于intel/2018.1.163的缩减量,它返回随机值.

Since OpenMP 4.0, user-defined reduction is supported. So I defined the reduction on std::vector in C++ exactly from here. It works fine with GNU/5.4.0 and GNU/6.4.0, but it returns random values for the reduction with intel/2018.1.163.

这是示例:

#include <iostream>
#include <vector>
#include <algorithm>
#include "omp.h"

#pragma omp declare reduction(vec_double_plus : std::vector<double> : \
                              std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<double>())) \
                    initializer(omp_priv = omp_orig)

int main() {

    omp_set_num_threads(4);
    int size = 100;
    std::vector<double> w(size,0);

#pragma omp parallel for reduction(vec_double_plus:w)
    for (int i = 0; i < 4; ++i)
        for (int j = 0; j < w.size(); ++j)
            w[j] += 1;

    for(auto i:w)
        if(i != 4)
            std::cout << i << std::endl;

    return 0;
}

每个线程向所有w个条目(其本地w)加1,最后将它们全部加在一起(减少).对于所有w条目,GNU的结果是4,而intel编译器的结果是随机的.有人知道这里发生了什么吗?

Each thread adds 1 to all w entries (its local w) and at the end all of them are added to together (reduction). The result for all w entries is 4 with GNU, but random with the intel compiler. Does anyone have any idea what is happening here?

推荐答案

这似乎是Intel编译器中的错误,我可以使用不涉及向量的C示例可靠地重现它:

This appears to be a bug in the Intel compiler, I can reliably reproduce it with a C example not involving vectors:

#include <stdio.h>

void my_sum_fun(int* outp, int* inp) {
    printf("%d @ %p += %d @ %p\n", *outp, outp, *inp, inp);
    *outp = *outp + *inp;
}

int my_init(int* orig) {
    printf("orig: %d @ %p\n", *orig, orig);
    return *orig;
}

#pragma omp declare reduction(my_sum : int : my_sum_fun(&omp_out, &omp_in) initializer(omp_priv = my_init(&omp_orig))

int main()
{   
    int s = 0;
    #pragma omp parallel for reduction(my_sum : s)
    for (int i = 0; i < 2; i++)
        s+= 1;

    printf("sum: %d\n", s);
}

输出:

orig: 0 @ 0x7ffee43ccc80
0 @ 0x7ffee43ccc80 += 1 @ 0x7ffee43cc780
orig: 1 @ 0x7ffee43ccc80
1 @ 0x7ffee43ccc80 += 2 @ 0x2b56d095ca80
sum: 3

在从原始值初始化私有副本之前,将归约运算应用于原始变量 .这会导致错误的结果.

It applies the reduction operation to the original variable before initializing the private copy from the original value. This leads to the wrong result.

您可以手动添加障碍作为解决方法:

You can manually add a barrier as a workaround:

#pragma omp parallel reduction(vec_double_plus : w)
{
  #pragma omp for
  for (int i = 0; i < 4; ++i)
    for (int j = 0; j < w.size(); ++j)
      w[j] += 1;
  #pragma omp barrier
}

这篇关于英特尔编译器(C ++)在std :: vector上具有OpenMP减少问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆