OpenMP划分内核循环 [英] OpenMP divide for loop over cores
问题描述
我正在尝试使用sse指令和openmp在parrallel中执行某些应用程序. 关于openmp部分,我有类似的代码:
i am trying to execute some application in parrallel using sse instructions and openmp. Concerning the openmp part i have code like:
for(r=0; r<end_condition; r++){
.. several nested for loops inside ..
}
我想将此循环划分为r到多个内核上,例如,当使用两个内核时,一个内核应执行r = 0 .. r = end_condition/2-1和另一个r = end_condition/2 .. r = end_condition-1.循环的迭代之间没有通信,因此它们可以并行运行,在r循环的末尾,结果应该同步.
i want to divide this loop over r over multiple cores, and for example when using two cores one core should execute r=0 .. r=end_condition/2-1 and the other r=end_condition/2 .. r=end_condition-1. There is no communication between iterations of the loop so they can be ran in parallel, at the end of the r loop the results should be synchronized.
如何使用openmp指令以这种方式在内核之间进行划分?我是否必须在r上展开循环并使用openmp部分?
How can i divide this over the cores this way using openmp directives? Do i have to unroll the loop over r and use openmp sections?
预先感谢
推荐答案
使用以下代码,编译器将生成一个并行区域,该区域由N个线程执行.
With the following code the compiler generates a parallel region, which is executed by N threads.
omp_set_num_threads(N);
#pragma omp parallel for
for(int r = 0; r < end_condition; ++r)
{
.. several nested for loops inside ..
}
每个线程都从end_condition执行一个子集.注意,您的计数变量r现在在范围的omp并行内部声明.现在,每个线程都有自己的计数变量.
Each thread executes a subset from end_condition. Note that your counting variable r is now declared inside the omp parallel for scope. Now each thread has its own counting variable.
使用并行编译指示而不是并行编译可以实现相同的目标,如下所示:
The same goal can be achieved using the the parallel pragma, not the parallel for, like this:
omp_set_num_threads(N);
#pragma omp parallel private(r)
{
int tid = omp_get_thread_num();
for(r = (end_condition/N) * tid; r < (end_condition/N) * (tid+1) ; ++r)
{
.. several nested for loops inside ..
}
}
当然只有在end_condition%N = 0时,您才可以达成交易.在这里,变量r被显式标记为线程私有,并且可以在需要时声明为变量r.编译器将为每个线程生成一个副本.
of course only when end_condition%N = 0 but you sould get the deal. Here the variable r is explicit marked as private to the thread and can be declared werevere you want. The compiler will generate a copy for each thread.
这篇关于OpenMP划分内核循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!