基于任务的编程:#pragma omp task 与 #pragma omp parallel for [英] Task based programming : #pragma omp task versus #pragma omp parallel for

查看:123
本文介绍了基于任务的编程:#pragma omp task 与 #pragma omp parallel for的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑:

    void saxpy_worksharing(float* x, float* y, float a, int N) {
      #pragma omp parallel for
      for (int i = 0; i < N; i++) {
         y[i] = y[i]+a*x[i];
      }
    }

    void saxpy_tasks(float* x, float* y, float a, int N) {
      #pragma omp parallel
      {
         for (int i = 0; i < N; i++) {
         #pragma omp task
         {
           y[i] = y[i]+a*x[i];
         }
      }
   }

使用任务和 omp 并行指令有什么区别?为什么我们可以写递归算法,比如带任务的归并排序,而不是工作共享?

What is the difference using tasks and the omp parallel directive ? Why can we write recursive algorithms such as merge sort with tasks, but not with worksharing ?

推荐答案

我建议您查看来自 Lawrence Livermore 国家实验室的 OpenMP 教程,可用 此处.

I would suggest that you have a look at the OpenMP tutorial from Lawrence Livermore National Laboratory, available here.

您的特定示例应该使用 OpenMP 任务实现.第二个代码创建了 N 倍的线程任务数(因为在缺少 } 旁边的代码中还有一个错误;我稍后会回来),并且每个任务只是执行一个非常简单的计算.任务的开销将是巨大的,正如您在我对这个问题的回答中所见.除了第二个代码在概念上是错误的.由于没有工作共享指令,所有线程将执行循环的所有迭代,而不是 N 个任务,N 倍的线程任务将被创建.它应该以下列方式之一重写:

Your particular example is one that should not be implemented using OpenMP tasks. The second code creates N times the number of threads tasks (because there is an error in the code beside the missing }; I would come back to it later), and each task is only performing a very simple computation. The overhead of tasks would be gigantic, as you can see in my answer to this question. Besides the second code is conceptually wrong. Since there is no worksharing directive, all threads would execute all iterations of the loop and instead of N tasks, N times the number of threads tasks would get created. It should be rewritten in one of the following ways:

单任务生产者 - 常见模式,NUMA 不友好:

Single task producer - common pattern, NUMA unfriendly:

void saxpy_tasks(float* x, float* y, float a, int N) {
   #pragma omp parallel
   {
      #pragma omp single
      {
         for (int i = 0; i < N; i++)
            #pragma omp task
            {
               y[i] = y[i]+a*x[i];
            }
      }
   }
}

single 指令将使循环仅在单个线程内运行.所有其他线程将跳过它并在 single 构造的末尾遇到隐式障碍.由于屏障包含隐式任务调度点,等待线程将在任务可用时立即开始处理.

The single directive would make the loop run inside a single thread only. All other threads would skip it and hit the implicit barrier at the end of the single construct. As barriers contain implicit task scheduling points, the waiting threads will start processing tasks immediately as they become available.

并行任务生产者 - 对 NUMA 更友好:

Parallel task producer - more NUMA friendly:

void saxpy_tasks(float* x, float* y, float a, int N) {
   #pragma omp parallel
   {
      #pragma omp for
      for (int i = 0; i < N; i++)
         #pragma omp task
         {
            y[i] = y[i]+a*x[i];
         }
   }
}

在这种情况下,任务创建循环将在线程之间共享.

In this case the task creation loop would be shared among the threads.

如果您不知道 NUMA 是什么,请忽略有关 NUMA 友好性的评论.

If you do not know what NUMA is, ignore the comments about NUMA friendliness.

这篇关于基于任务的编程:#pragma omp task 与 #pragma omp parallel for的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆