OpenMP线程“违反"障碍物 [英] OpenMP threads "disobey" omp barrier

查看:85
本文介绍了OpenMP线程“违反"障碍物的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以这是代码:

#pragma omp parallel private (myId)
{
  set_affinity();

  myId = omp_get_thread_num(); 

  if (myId<myConstant)
  {
    #pragma omp for schedule(static,1)
    for(count = 0; count < AnotherConstant; count++)
      {
        //Do stuff, everything runs as it should
      }
  }

#pragma omp barrier //all threads wait as they should
#pragma omp single
 {
    //everything in here is executed by one thread as it should be
 }
   #pragma omp barrier //this is the barrier in which threads run ahead
   par_time(cc_time_tot, phi_time_tot, psi_time_tot);
   #pragma omp barrier
}
//do more stuff

现在要解释发生了什么.在我的并行区域的开始,将myId设置为private,以便每个线程都有其正确的线程ID. set_affinity()控制哪个线程在哪个内核上运行.我遇到的问题涉及schedule(static,1)的#pragma omp.

Now to explain whats going on. At the start of my parallel region myId is set to private so that every thread has its correct thread id. set_affinity() controls which thread runs on which core. The issue I have involves the #pragma omp for schedule(static,1).

方块:

  if (myId<myConstant)
  {
    #pragma omp for schedule(static,1)
    for(count = 0; count < AnotherConstant; count++)
      {
        //Do stuff, everything runs as it should
      }
  }

代表我要分配给一定数量的线程(通过myConstant-1分配0)的一些工作.在这些线程上,我想均匀地(以schedule(static,1)的方式)分布循环的迭代.这一切都正确执行.

Represents some work that I want to distribute over a certain number of threads, 0 through myConstant-1. On these threads I want to evenly (in the manner which schedule(static,1) does) distribute the iterations of the loop. This is all performed correctly.

然后代码进入单个区域,其中的所有命令均按应有的方式执行.但是说我将myConstant指定为2.然后,如果我使用3个或更多线程运行,则通过单一材料进行的所有操作均正确执行,但是ID为3或更大的线程不会等待单个中的所有命令完成.

Then the code enters a single region, all commands in there are performed as they should be. But say I specify myConstant to be 2. Then if I run with 3 threads or more, everything through the single material executes correctly, but threads with id 3 or greater do not wait for all the commands within the single to finish.

在单个函数中,调用了一些函数来创建由所有线程执行的任务. id为3或更大(通常为myConstant或更大)的线程继续运行,执行par_time(),而其他线程仍在执行由单个代码中执行的函数创建的任务. par_time()只是为每个线程输出一些数据.

Within the single some functions are called that create tasks which are carried out by all threads. The threads with id of 3 or more (in general of myConstant or more) continue on, executing par_time() while the other threads are still carrying out tasks created by the functions executed in the single. par_time() just prints some data for each thread.

如果我注释掉schedule(static,1)的编译指示,并且只有一个线程执行for循环(例如,将if语句更改为if(myId == 0)),那么一切正常.所以我不确定为什么前面提到的线程会继续向前.

If I comment out the pragma omp for schedule(static,1) and just have a single thread execute the for loop (change if statement to if(myId==0) for instance), then everything works. So I'm just not sure why the aforementioned threads are continuing onwards.

让我知道是否有任何令人困惑的问题,这是一个特定的问题.我一直在寻找是否有人发现我的OMP流控制存在缺陷.

Let me know if anything is confusing, it's kind of a specific issue. I was looking so see if anyone saw a flaw in my flow control with OMP.

推荐答案

如果您查看OpenMP V3.0规范,则第2.5节工作共享构造"指出:

If you look at the OpenMP V3.0 spec, section 2.5 Worksharing Constructs, states:

以下限制适用于 工作共享构造:

The following restrictions apply to worksharing constructs:

  • 团队中的所有线程都必须遇到每个工作共享区域 或根本没有.
  • 必须遇到的工作共享区域和障碍区域的顺序 每个线程中的每个线程都相同 团队.
  • Each worksharing region must be encountered by all threads in a team or by none at all.
  • The sequence of worksharing regions and barrier regions encountered must be the same for every thread in a team.

通过在if中进行工作共享,您违反了这两个限制,从而使程序不符合要求.根据规范,不合格的OpenMP程序具有未指定"的行为.

By having the the worksharing for within the if, you have violated both of these restrictions making your program non-conforming. A non-conforming OpenMP program has "unspecified" behavior according to the specification.

关于将使用哪些线程来执行for循环,并且调度类型为"static,1",第一个工作块(在这种情况下为count = 0)将分配给线程0.下一个块(count = 1)将分配给线程1,依此类推,直到分配了所有块.如果块多于线程,则分配将以循环方式在线程0重新开始.您可以在OpenMP规范的2.5.1节循环构造"中的计划"子句下的说明中阅读确切的用语.

As to which threads will be used to execute the for loop, with the schedule type of "static,1", the first chunk of work - in this case count=0 - will be assigned to thread 0. The next chunk (count=1) will be assigned to thread 1, etc. until all chunks are assigned. If there are more chunks than threads then assignment will restart at thread 0 in a round-robin fashion. You can read the exact wording in the OpenMP spec, section 2.5.1 Loop construct, under description where it talks about the schedule clause.

这篇关于OpenMP线程“违反"障碍物的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆