OpenMP并行线程 [英] OpenMP parallel thread

查看:94
本文介绍了OpenMP并行线程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要并行化此循环,尽管我想使用它是一个好主意,但我之前从未研究过它们.

I need to parallelize this loop, I though that to use was a good idea, but I never studied them before.

 #pragma omp parallel for

for(std::set<size_t>::const_iterator it=mesh->NEList[vid].begin();
        it!=mesh->NEList[vid].end(); ++it){

    worst_q = std::min(worst_q, mesh->element_quality(*it));
}

在这种情况下,循环未并行化,因为它使用迭代器,并且编译器无法 了解如何将其切开.

In this case the loop is not parallelized because it uses iterator and the compiler cannot understand how to slit it.

你能帮我吗?

推荐答案

OpenMP要求并行for循环中的控制谓词具有以下关系运算符之一:<<=>>=.只有随机访问迭代器提供这些运算符,因此OpenMP并行循环仅适用于提供随机访问迭代器的容器. std::set仅提供双向迭代器.您可以使用显式任务来克服该限制.减少可以通过首先对每个线程变量的私有部分进行局部缩减,然后对部分值进行全局缩减.

OpenMP requires that the controlling predicate in parallel for loops has one of the following relational operators: <, <=, > or >=. Only random access iterators provide these operators and hence OpenMP parallel loops work only with containers that provide random access iterators. std::set provides only bidirectional iterators. You may overcome that limitation using explicit tasks. Reduction can be performed by first partially reducing over private to each thread variables followed by a global reduction over the partial values.

double *t_worst_q;
// Cache size on x86/x64 in number of t_worst_q[] elements
const int cb = 64 / sizeof(*t_worst_q);

#pragma omp parallel
{
   #pragma omp single
   {
      t_worst_q = new double[omp_get_num_threads() * cb];
      for (int i = 0; i < omp_get_num_threads(); i++)
         t_worst_q[i * cb] = worst_q;
   }

   // Perform partial min reduction using tasks
   #pragma omp single
   {
      for(std::set<size_t>::const_iterator it=mesh->NEList[vid].begin();
          it!=mesh->NEList[vid].end(); ++it) {
         size_t elem = *it;
         #pragma omp task
         {
            int tid = omp_get_thread_num();
            t_worst_q[tid * cb] = std::min(t_worst_q[tid * cb],
                                           mesh->element_quality(elem));
         }
      }
   }

   // Perform global reduction
   #pragma omp critical
   {
      int tid = omp_get_thread_num();
      worst_q = std::min(worst_q, t_worst_q[tid * cb]);
   }
}

delete [] t_worst_q;

(我假设mesh->element_quality()返回double)

一些要点:

  • 该循环仅由一个线程串行执行,但是每次迭代都会创建一个新任务.这些很可能由空闲线程排队等待执行.
  • 等待single构造的隐式屏障的空闲线程在创建任务后立即开始使用它们.
  • it指向的值在任务正文之前被取消引用.如果在任务正文中取消引用,则it将是firstprivate,并且将为每个任务创建迭代器的副本(即在每次迭代中).这不是您想要的.
  • 每个线程在其t_worst_q[]的私有部分中执行部分缩减.
  • 为了防止由于错误共享而导致的性能下降,每个线程访问的t_worst_q[]元素被分隔开,从而以单独的缓存行结尾.在x86/x64上,缓存行为64字节,因此线程号乘以cb = 64 / sizeof(double).
  • 全局min减少是在critical构造内部执行的,以防止worst_q被多个线程一次访问.这仅出于说明目的,因为也可以通过并行区域之后的主线程中的循环来执行减少操作.
  • The loop is executed serially by one thread only, but each iteration creates a new task. These are most likely queued for execution by the idle threads.
  • Idle threads waiting at the implicit barrier of the single construct begin consuming tasks as soon as they are created.
  • The value pointed by it is dereferenced before the task body. If dereferenced inside the task body, it would be firstprivate and a copy of the iterator would be created for each task (i.e. on each iteration). This is not what you want.
  • Each thread performs partial reduction in its private part of the t_worst_q[].
  • In order to prevent performance degradation due to false sharing, the elements of t_worst_q[] that each thread accesses are spaced out so to end up in separate cache lines. On x86/x64 the cache line is 64 bytes, therefore the thread number is multiplied by cb = 64 / sizeof(double).
  • The global min reduction is performed inside a critical construct to protect worst_q from being accessed by several threads at once. This is for illustrative purposes only since the reduction could also be performed by a loop in the main thread after the parallel region.

请注意,显式任务需要支持OpenMP 3.0或3.1的编译器.这排除了所有版本的Microsoft C/C ++编译器(仅支持OpenMP 2.0).

Note that explicit tasks require compiler which supports OpenMP 3.0 or 3.1. This rules out all versions of Microsoft C/C++ Compiler (it only supports OpenMP 2.0).

这篇关于OpenMP并行线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆