容器元素上的 OpenMP 减少 [英] OpenMP reduction on container elements

查看:88
本文介绍了容器元素上的 OpenMP 减少的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个嵌套循环,外部迭代很少,内部迭代很多.在内部循环中,我需要计算一个总和,因此我想使用 OpenMP 缩减.外循环在一个容器上,因此减少应该发生在该容器的一个元素上.这是一个最小的人为示例:

I have a nested loop, with few outer, and many inner iterations. In the inner loop, I need to calculate a sum, so I want to use an OpenMP reduction. The outer loop is on a container, so the reduction is supposed to happen on an element of that container. Here's a minimal contrived example:

#include <omp.h>
#include <vector>
#include <iostream>

int main(){
    constexpr int n { 128 };

    std::vector<int> vec (4, 0);
    for (unsigned int i {0}; i<vec.size(); ++i){

        /* this does not work */
        //#pragma omp parallel for reduction (+:vec[i])
        //for (int j=0; j<n; ++j)
        //  vec[i] +=j;

        /* this works */
        int* val { &vec[0] };
        #pragma omp parallel for reduction (+:val[i])
        for (int j=0; j<n; ++j)
            val[i] +=j;

        /* this is allowed, but looks very wrong. Produces wrong results
         * for std::vector, but on an Eigen type, it worked. */
        #pragma omp parallel for reduction (+:val[i])
        for (int j=0; j<n; ++j)
            vec[i] +=j;
    }
    for (unsigned int i=0; i<vec.size(); ++i) std::cout << vec[i] << " ";
    std::cout << "\n";

    return 0;
}

问题是,如果我将归约子句写为 (+:vec[i]),我会收到错误 'vec' 没有指针或数组类型,其描述性足以找到解决方法.然而,这意味着我必须引入一个新变量并稍微改变代码逻辑,而且我发现代码应该做什么不太明显.

The problem is, that if I write the reduction clause as (+:vec[i]), I get the error ‘vec’ does not have pointer or array type, which is descriptive enough to find a workaround. However, that means I have to introduce a new variable and somewhat change the code logic, and I find it less obvious to see what the code is supposed to do.

我的主要问题是,是否有更好/更干净/更标准的方法来编写容器元素的减少.

My main question is, whether there is a better/cleaner/more standard way to write a reduction for container elements.

我还想知道上面代码中显示的第三种方式有点为什么以及如何工作.我实际上正在使用 Eigen 库,在它的容器上,该变体似乎工作得很好(虽然尚未对其进行广泛测试),但在 std::vector 上,它产生的结果介于零和实际结果 (8128) 之间.我认为它应该可以工作,因为 vec[i]val[i] 应该 都评估为取消引用相同的地址.但可惜,显然不是.

I'd also like to know why and how the third way shown in the code above somewhat works. I'm actually working with the Eigen library, on whose containers that variant seems to work just fine (haven't extensively tested it though), but on std::vector, it produces results somewhere between zero and the actual result (8128). I thought it should work, because vec[i] and val[i] should both evaluate to dereferencing the same address. But alas, apparently not.

我使用的是 OpenMP 4.5 和 gcc 9.3.0.

I'm using OpenMP 4.5 and gcc 9.3.0.

推荐答案

我分三部分回答你的问题:

I'll answer your question in three parts:

1.在上面的示例中使用 std::vec 执行 OpenMP 缩减的最佳方法是什么?

1. What is the best way to perform to OpenMP reductions in your example above with a std::vec ?

i) 使用你的方法,即创建一个指针 int* val { &vec[0] };

i) Use your approach, i.e. create a pointer int* val { &vec[0] };

ii) 声明一个新的共享变量,例如 @1201ProgramAlarm 已回答.

ii) Declare a new shared variable like @1201ProgramAlarm answered.

iii) 声明 用户定义的归约(这在您的简单案例中并不真正适用,但请参阅下面的 3. 以获得更有效的模式).

iii) declare a user defined reduction (which is not really applicable in your simple case, but see 3. below for a more efficient pattern).

2.为什么第三个循环不起作用,为什么它适用于 Eigen ?

就像之前的答案一样,您告诉 OpenMP 对内存地址 X 执行归约求和,但您正在对内存地址 Y 执行加法,这意味着归约声明被忽略,并且您的加法受制于通常的线程竞争条件.

Like the previous answer states you are telling OpenMP to perform a reduction sum on a memory address X, but you are performing additions on memory address Y, which means that the reduction declaration is ignored and your addition is subjected to the usual thread race conditions.

您并未真正提供有关 Eigen 风险的太多细节,但这里有一些可能的解释:

You don't really provide much detail into your Eigen venture, but here are some possible explanations:

i) 你并没有真正使用多线程(检查 n = Eigen::nbThreads( ))

i) You're not really using multiple threads (check n = Eigen::nbThreads( ))

ii) 您没有禁用 Eigen 自己的并行性,这会破坏您自己对 OpenMP 的使用,例如EIGEN_DONT_PARALLELIZE 编译器指令.

ii) You didn't disable Eigen's own parallelism which can disrupt your own usage of OpenMP, e.g. EIGEN_DONT_PARALLELIZE compiler directive.

iii) 竞争条件存在,但您没有看到它,因为特征运算需要更长的时间,您使用的线程数量很少,并且只写入少量的值 => 线程干扰每个的发生率较低否则会产生错误的结果.

iii) The race condition is there, but you're not seeing it because Eigen operations take longer, you're using a low number of threads and only writing a low number of values => lower occurrence of threads interfering with each other to produce the wrong result.

3.我应该如何使用 OpenMP 并行化这个场景(技术上不是你明确提出的问题)?

与其只并行化内部循环,不如同时并行化两者.您拥有的串行代码越少越好.在这种情况下,每个线程都有自己的 vec 向量的私有副本,在它们各自的线程对所有元素求和后,向量会减少.此解决方案最适合您提供的示例,但如果您使用非常大的向量和非常多的线程(或 RAM 非常有限),则可能会遇到 RAM 问题.

Instead of parallelizing only the inner loop, you should parallelize both at the same time. The less serial code you have, the better. In this scenario each thread has its own private copy of the vec vector, which gets reduced after all the elements have been summed by their respective thread. This solution is optimal for your presented example, but might run into RAM problems if you're using a very large vector and very many threads (or have very limited RAM).

#pragma omp parallel for collapse(2) reduction(vsum : vec)
for (unsigned int i {0}; i<vec.size(); ++i){
    for (int j = 0; j < n; ++j) {
        vec[i] += j;
    }
}

其中 vsum 是用户定义的归约,即

where vsum is a user defined reduction, i.e.

#pragma omp declare reduction(vsum : std::vector<int> : std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<int>())) initializer(omp_priv = decltype(omp_orig)(omp_orig.size()))

在你使用它的函数之前声明reduce,你会很高兴

Declare the reduction before the function where you use it, and you'll be good to go

这篇关于容器元素上的 OpenMP 减少的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆