初始化变量以减少OMP [英] Initialize variable for omp reduction

查看:61
本文介绍了初始化变量以减少OMP的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

OpenMP标准为归约变量指定一个初始值.因此,我必须初始化变量,在以下情况下该如何做:

The OpenMP standard specifies an initial value for a reduction variable. So do I have to initialize the variable and how would I do that in the following case:

int sum;
//...
for(int it=0;i<maxIt;i++){
#pragma omp parallel
{
  #pragma omp for nowait
  for(int i=0;i<ct;i++)
    arrayX[i]=arrayY[i];

  sum = 0;
  #pragma omp for reduction(+:sum)
  for(int i=0;i<ct;i++)
    sum+=arrayZ[i];
}
//Use sum
}

请注意,我仅使用一个并行区域来最大程度地减少开销并允许在第一个循环中使用nowait.按原样使用此方法将导致数据争用(IMO),因为其他线程启动第二个循环后,来自第一个循环的线程将重置和.
当然,我可以在外部循环的顶部执行此操作,但是在一般情况下,对于大型代码库,您可能会忘记需要或将其设置在此处,这会产生意外的结果.
"omp single"在这里有帮助吗?我怀疑线程A执行单个线程时,另一个线程可能已经进入了简化循环.障碍物"是可能的,但我想避免这种情况,因为它击败了"nowait".

Note that I use only 1 parallel region to minimize overhead and to allow the nowait in the first loop. Using this as-is would lead to a data race (IMO) because the threads coming from the first loop after other threads started the 2nd loop will reset sum.
Of course I can do this at the top of the outer loop but in a general case and for large code bases you may forget that you need or had set it there which produces unexpected results.
Does "omp single" help here? I suspect that while thread A executes the single, another thread may already enter the reduction loop. "omp barrier" is possible but I want to avoid that as it defeats the "nowait".

最后一个例子:

#pragma omp parallel
{
  sum = 0;
  #pragma omp for reduction(+:sum)
  for(int i=0;i<ct;i++)
    sum+=arrayZ[i];
  //Use sum
  sum = 0;
  #pragma omp for reduction(+:sum)
  for(int i=0;i<ct;i++)
    sum+=arrayZ[i];
  //Use sum
}

我如何在这里(重新)初始化?

How would I (re)initialize here?

推荐答案

该答案是错误,因为它做出的假设不在OpenMP规范中.由于无法删除已接受的答案,因此在此以示例为例,您应该始终怀疑并验证在Internet上找到的代码和/或语句.

This answer is wrong as it makes an assumption that is not in the OpenMP specification. As accepted answers cannot be deleted, I'm leaving it here as an example that one should always doubt and validate code and/or statements found on the Internet.

实际上,该代码不显示数据竞争:

Actually, the code doesn't exhibit data races:

#pragma omp parallel
{
   ...
   sum = 0;
   #pragma omp for reduction(+:sum)
   for(int i=0;i<ct;i++)
     sum+=arrayZ[i];
   ...
}

这里发生的是,在工作共享结构中创建了 sum 的私有副本,并将其初始化为 0 ( + 运算符).每个本地副本都由循环主体更新.给定线程完成后,它将等待 for 构造末尾存在的隐式屏障.一旦所有线程到达障碍,它们的本地 sum 副本就被加在一起,并将结果添加到共享值中.

What happens here is that a private copy of sum is created inside the worksharing construct and is initialised to 0 (the initialisation value for the + operator). Each local copy is updated by the loop body. Once a given thread has finished, it waits at the implicit barrier present at the end of the for construct. Once all threads have reached the barrier, their local copies of sum are summed together and the result is added to the shared value.

所有线程都可以在不同时间执行 sum = 0; 无关紧要,因为仅在达到障碍后才更新其值.考虑上面的代码,执行以下操作:

It doesn't matter that all threads might execute sum = 0; at different time since its value is only updated once the barrier has been reached. Think of the code above performing something like:

...
sum = 0;
// Start of the for worksharing construct
int local_sum = 0;                     // ^
for(int i = i_start; i < i_end; i++)   // | sum not used here
  local_sum += arrayZ[i];              // v
// Implicit construct barrier
#pragma omp barrier
// Reduction
#pragma omp atomic update
sum += local_sum;
#pragma omp barrier
// End of the worksharing construct
...

第二个例子也是如此.

这篇关于初始化变量以减少OMP的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆