OpenMP划分内核循环 [英] OpenMP divide for loop over cores

查看:156
本文介绍了OpenMP划分内核循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用sse指令和openmp在parrallel中执行某些应用程序. 关于openmp部分,我有类似的代码:

i am trying to execute some application in parrallel using sse instructions and openmp. Concerning the openmp part i have code like:

for(r=0; r<end_condition; r++){
    .. several nested for loops inside ..
}

我想将此循环划分为r到多个内核上,例如,当使用两个内核时,一个内核应执行r = 0 .. r = end_condition/2-1和另一个r = end_condition/2 .. r = end_condition-1.循环的迭代之间没有通信,因此它们可以并行运行,在r循环的末尾,结果应该同步.

i want to divide this loop over r over multiple cores, and for example when using two cores one core should execute r=0 .. r=end_condition/2-1 and the other r=end_condition/2 .. r=end_condition-1. There is no communication between iterations of the loop so they can be ran in parallel, at the end of the r loop the results should be synchronized.

如何使用openmp指令以这种方式在内核之间进行划分?我是否必须在r上展开循环并使用openmp部分?

How can i divide this over the cores this way using openmp directives? Do i have to unroll the loop over r and use openmp sections?

预先感谢

推荐答案

使用以下代码,编译器将生成一个并行区域,该区域由N个线程执行.

With the following code the compiler generates a parallel region, which is executed by N threads.

omp_set_num_threads(N);

#pragma omp parallel for
for(int r = 0; r < end_condition; ++r)
{
    .. several nested for loops inside ..
}

每个线程都从end_condition执行一个子集.注意,您的计数变量r现在在范围的omp并行内部声明.现在,每个线程都有自己的计数变量.

Each thread executes a subset from end_condition. Note that your counting variable r is now declared inside the omp parallel for scope. Now each thread has its own counting variable.

使用并行编译指示而不是并行编译可以实现相同的目标,如下所示:

The same goal can be achieved using the the parallel pragma, not the parallel for, like this:

omp_set_num_threads(N);
#pragma omp parallel private(r)
{
   int tid = omp_get_thread_num();
   for(r = (end_condition/N) * tid; r < (end_condition/N) * (tid+1) ; ++r)
   {
    .. several nested for loops inside ..
   }
}

当然只有在end_condition%N = 0时,您才可以达成交易.在这里,变量r被显式标记为线程私有,并且可以在需要时声明为变量r.编译器将为每个线程生成一个副本.

of course only when end_condition%N = 0 but you sould get the deal. Here the variable r is explicit marked as private to the thread and can be declared werevere you want. The compiler will generate a copy for each thread.

这篇关于OpenMP划分内核循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆