OpenMP并行线程 [英] OpenMP parallel thread

查看：94 发布时间：2020/5/21 1:21:31 c++ performance parallel-processing openmp

本文介绍了OpenMP并行线程的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要并行化此循环，尽管我想使用它是一个好主意，但我之前从未研究过它们.

I need to parallelize this loop, I though that to use was a good idea, but I never studied them before.

 #pragma omp parallel for

for(std::set<size_t>::const_iterator it=mesh->NEList[vid].begin();
        it!=mesh->NEList[vid].end(); ++it){

    worst_q = std::min(worst_q, mesh->element_quality(*it));
}

在这种情况下，循环未并行化，因为它使用迭代器，并且编译器无法了解如何将其切开.

In this case the loop is not parallelized because it uses iterator and the compiler cannot understand how to slit it.

你能帮我吗?

推荐答案

OpenMP要求并行for循环中的控制谓词具有以下关系运算符之一:<，<=，>或>=.只有随机访问迭代器提供这些运算符，因此OpenMP并行循环仅适用于提供随机访问迭代器的容器. std::set仅提供双向迭代器.您可以使用显式任务来克服该限制.减少可以通过首先对每个线程变量的私有部分进行局部缩减，然后对部分值进行全局缩减.

OpenMP requires that the controlling predicate in parallel for loops has one of the following relational operators: <, <=, > or >=. Only random access iterators provide these operators and hence OpenMP parallel loops work only with containers that provide random access iterators. std::set provides only bidirectional iterators. You may overcome that limitation using explicit tasks. Reduction can be performed by first partially reducing over private to each thread variables followed by a global reduction over the partial values.

double *t_worst_q;
// Cache size on x86/x64 in number of t_worst_q[] elements
const int cb = 64 / sizeof(*t_worst_q);

#pragma omp parallel
{
   #pragma omp single
   {
      t_worst_q = new double[omp_get_num_threads() * cb];
      for (int i = 0; i < omp_get_num_threads(); i++)
         t_worst_q[i * cb] = worst_q;
   }

   // Perform partial min reduction using tasks
   #pragma omp single
   {
      for(std::set<size_t>::const_iterator it=mesh->NEList[vid].begin();
          it!=mesh->NEList[vid].end(); ++it) {
         size_t elem = *it;
         #pragma omp task
         {
            int tid = omp_get_thread_num();
            t_worst_q[tid * cb] = std::min(t_worst_q[tid * cb],
                                           mesh->element_quality(elem));
         }
      }
   }

   // Perform global reduction
   #pragma omp critical
   {
      int tid = omp_get_thread_num();
      worst_q = std::min(worst_q, t_worst_q[tid * cb]);
   }
}

delete [] t_worst_q;

(我假设mesh->element_quality()返回double)

一些要点:

该循环仅由一个线程串行执行，但是每次迭代都会创建一个新任务.这些很可能由空闲线程排队等待执行.
等待single构造的隐式屏障的空闲线程在创建任务后立即开始使用它们.
由it指向的值在任务正文之前被取消引用.如果在任务正文中取消引用，则it将是firstprivate，并且将为每个任务创建迭代器的副本(即在每次迭代中).这不是您想要的.
每个线程在其t_worst_q[]的私有部分中执行部分缩减.
为了防止由于错误共享而导致的性能下降，每个线程访问的t_worst_q[]元素被分隔开，从而以单独的缓存行结尾.在x86/x64上，缓存行为64字节，因此线程号乘以cb = 64 / sizeof(double).
全局min减少是在critical构造内部执行的，以防止worst_q被多个线程一次访问.这仅出于说明目的，因为也可以通过并行区域之后的主线程中的循环来执行减少操作.

The loop is executed serially by one thread only, but each iteration creates a new task. These are most likely queued for execution by the idle threads.
Idle threads waiting at the implicit barrier of the single construct begin consuming tasks as soon as they are created.
The value pointed by it is dereferenced before the task body. If dereferenced inside the task body, it would be firstprivate and a copy of the iterator would be created for each task (i.e. on each iteration). This is not what you want.
Each thread performs partial reduction in its private part of the t_worst_q[].
In order to prevent performance degradation due to false sharing, the elements of t_worst_q[] that each thread accesses are spaced out so to end up in separate cache lines. On x86/x64 the cache line is 64 bytes, therefore the thread number is multiplied by cb = 64 / sizeof(double).
The global min reduction is performed inside a critical construct to protect worst_q from being accessed by several threads at once. This is for illustrative purposes only since the reduction could also be performed by a loop in the main thread after the parallel region.

请注意，显式任务需要支持OpenMP 3.0或3.1的编译器.这排除了所有版本的Microsoft C/C ++编译器(仅支持OpenMP 2.0).

Note that explicit tasks require compiler which supports OpenMP 3.0 or 3.1. This rules out all versions of Microsoft C/C++ Compiler (it only supports OpenMP 2.0).

这篇关于OpenMP并行线程的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

OpenMP并行线程 [英] OpenMP parallel thread

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

OpenMP并行线程 [英] OpenMP parallel thread

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭