OpenMP如何在减少子句中使用原子指令? [英] How does OpenMP use the atomic instruction inside reduction clause?

查看:75
本文介绍了OpenMP如何在减少子句中使用原子指令?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

OpenMP 如何在约简构造函数中使用 atomic 指令?它根本不依赖原子指令吗?

How does OpenMP uses atomic instructions inside reduction constructor? Doesn't it rely on atomic instructions at all?

例如,下面代码中的变量 sum 是否通过 atomic '+'运算符累加?

For instance, is the variable sum in the code below accumulated with atomic '+' operator?

#include <omp.h>
#include <vector>

using namespace std;
int main()
{
  int m = 1000000; 
  vector<int> v(m);
  for (int i = 0; i < m; i++)
    v[i] = i;

  int sum = 0;
  #pragma omp parallel for reduction(+:sum)
  for (int i = 0; i < m; i++)
    sum += v[i];
}

推荐答案

OpenMP如何在还原中使用原子指令?不是吗完全依靠原子?

How does OpenMP uses atomic instruction inside reduction? Doesn't it rely on atomic at all?

由于OpenMP标准未指定应(或不应该)实施 reduction 子句的方式(例如基于 atomic 操作或不是),具体取决于OpenMP标准的每种具体实现.

Since the OpenMP standard does not specify how the reduction clause should (or not) be implemented (e.g., based on atomic operations or not), its implementation may vary depending on each concrete implementation of the OpenMP standard.

例如,下面代码中的变量总和是原子+运算符?

For instance, is the variable sum in the code below accumulated with atomic + operator?

尽管如此,从OpenMP标准中,仍然可以阅读以下内容:

Nonetheless, from the OpenMP standard, one can read the following:

reduce子句可用于执行某些形式的重复并行计算(...).对于并行和工作共享结构,a创建每个列表项的私有副本,每个隐式任务一个,就像使用了private子句一样.(...)私人副本是然后按照上面的指定进行初始化.在该区域的末尾指定了reduce子句的原始列表项是通过将其原始值与每个值的最终值相结合来更新私有副本,使用指定的组合器还原标识符.

The reduction clause can be used to perform some forms of recurrence calculations (...) in parallel. For parallel and work-sharing constructs, a private copy of each list item is created, one for each implicit task, as if the private clause had been used. (...) The private copy is then initialized as specified above. At the end of the region for which the reduction clause was specified, the original list item is updated by combining its original value with the final value of each of the private copies, using the combiner of the specified reduction-identifier.

因此,据此,我们可以推断出减少条款中使用的变量将是 private ,因此将不会自动更新.尽管如此,即使不是这种情况,OpenMP标准的具体实现也不太可能依靠 atomic 操作(对于 sum + = v [i]; ),因为(在这种情况下)不是最有效的策略.有关为什么会出现这种情况的更多信息,请检查以下SO线程:

So based on that, one can infer that the variables used on the reduction clause will be private, and consequently, will not be updated atomically. Notwithstanding, even if that was not the case it would be unlikely, though, that a concrete implementation of the OpenMP standard would rely on the atomic operation (for the instruction sum += v[i];) since (in this case) would not be the most efficient strategy. For more information on why is that the case check the following SO threads:

  1. 为什么我使用openMP原子的并行代码比串行代码花费的时间更长?;
  2. 为什么我应该使用减少量而不是而不是原子变量?.

非常非正式地,比使用 atomic 更有效的方法是使每个线程在 sum 的末尾都有自己的变量 sum 的副本.>并行区域,每个线程会将其副本保存到线程之间共享的资源中-现在,根据缩减的实现方式,可以使用 atomic 操作来更新该副本共享资源.然后,该资源将由 master 线程获取,该线程将减少其内容并相应地更新原始的 sum 变量.

Very informally, a more efficient approach than using atomic would be for each thread to have their own copy of the variable sum, and at the end of the parallel region, each thread would save its copy into a resource shared among threads -- now, depending on how the reduction is implemented, atomic operations might be used to update that shared resource. That resource would then be picked up by the master thread that would reduce its content and update the original sum variable, accordingly.

更多信息来自降低OpenMP的内幕:

详细回顾了并行缩减之后,您可能仍然关于OpenMP实际如何转变您的一些公开问题将顺序代码转换为并行代码.特别是,您可能会想OpenMP如何检测循环主体中执行的部分减少.例如,此或类似的代码片段可以经常在代码示例中找到:

After having revisited parallel reductions in detail you might still have some open questions about how OpenMP actually transforms your sequential code into parallel code. In particular, you might wonder how OpenMP detects the portion in the body of the loop that performs the reduction. As an example, this or a similar code fragment can often be found in code samples:

 #pragma omp parallel for reduction(+:x)
 for (int i = 0; i < n; i++)
     x -= some_value;

您还可以使用-作为归约运算符(实际上是对+多余).但是OpenMP如何隔离更新步骤x- = some_value?令人不快的答案是OpenMP根本无法检测到更新!编译器对待这样的for循环:

You could also use - as reduction operator (which is actually redundant to +). But how does OpenMP isolate the update step x-= some_value? The discomforting answer is that OpenMP does not detect the update at all! The compiler treats the body of the for-loop like this:

#pragma omp parallel for reduction(+:x)
     for (int i = 0; i < n; i++)
         x = some_expression_involving_x_or_not(x);

结果,x的修改也可能隐藏在不透明的>后面.函数调用.从编译器的角度来看,这是一个可理解的决定开发人员.不幸的是,这意味着您必须确保所有x的更新与减少条款.

As a result, the modification of x could also be hidden behind an opaque > function call. This is a comprehensible decision from the point of view of a compiler developer. Unfortunately, this means that you have to ensure that all updates of x are compatible with the operation defined in the reduction clause.

归约的总体执行流程可以概括为如下:

  1. 组建一组线程,并确定每个线程j必须执行的迭代集.
  2. 每个线程声明归约变量x的私有化变体,并用相应的中性元素e初始化monoid.
  3. 所有线程无论是否或如何涉及私有化变量的更新都执行其迭代.
  4. 将结果计算为(局部)部分结果和全局变量x的顺序缩减.最后,结果是写回x.

这篇关于OpenMP如何在减少子句中使用原子指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆