用一个线程做一个部分,用多线程做一个for循环 [英] Doing a section with one thread and a for-loop with multiple threads
问题描述
我正在使用OpenMP,并且希望生成线程,以便一个线程执行一段代码并完成操作,并与N个线程并行运行一个并行for循环.
I am using OpenMP and I want to spawn threads such that one thread executes one piece of code and finishes, in parallel with N threads running the iterations of a parallel-for loop.
执行应如下所示:
Section A (one thread) || Section B (parallel-for, multiple threads)
| || | | | | | | | | | |
| || | | | | | | | | | |
| || | | | | | | | | | |
| || | | | | | | | | | |
| || | | | | | | | | | |
V || V V V V V V V V V V
我不能只用#pragma omp once
编写并发,因为我不希望执行A节的线程执行for循环.
I cannot just write a parallel-for with a #pragma omp once
because I do not want the thread that executes section A to execute the for-loop.
我已经尝试过了:
#pragma omp parallel sections {
#pragma omp section
{
// Section A
}
#pragma omp section
{
// Section B;
#pragma omp parallel for
for (int i = 0; i < x; ++i)
something();
}
}
但是,parallel-for总是只用一个线程执行(我知道,因为我将循环的主体打印为omp_get_thread_num()
,并且它们都是相同的数字,根据执行的是两个线程是1还是0第二个并行部分).
However, the parallel-for always executes with only one thread (I know because I made the body of the loop print omp_get_thread_num()
and they are all the same number, either 1 or 0 depending on which thread of the two executed the second parallel section).
我也尝试过
#pragma omp sections {
#pragma omp section
{
// Section A
}
#pragma omp section
{
// Section B;
#pragma omp parallel for
for (int i = 0; i < x; ++i)
something();
}
}
这允许for循环使用多个线程执行,但它使各部分不并行,并且第一个部分在第二个部分之前顺序执行.
Which allows the for-loop to execute with multiple threads, but it makes the sections non-parallel, and the first section is executed sequentially before the second section.
我需要的是两种方法的组合,其中for循环的每次迭代和第一部分都并行运行.
What I need is a combination of the two approaches, where each iteration of the for-loop and the first section are all run in parallel.
推荐答案
嵌套并行性必须明确设置,因为大多数实现中默认情况下将其禁用.遵循OpenMP 4.0标准,您必须设置OMP_NESTED
环境变量:
Nested parallelism must be explicitly set, as it is disabled by default in most implementations. Standing to the OpenMP 4.0 standard, you must set the OMP_NESTED
environment variable:
OMP_NESTED环境变量通过以下方式控制嵌套并行性 设置Nest-var ICV的初始值.这个的价值 环境变量必须为true或false.如果环境 变量设置为true,启用嵌套并行性;如果设置为 错误,嵌套并行性被禁用.该程序的行为是 如果OMP_NESTED的值不为true也不是,则定义实现 错误.
The OMP_NESTED environment variable controls nested parallelism by setting the initial value of the nest-var ICV. The value of this environment variable must be true or false. If the environment variable is set to true, nested parallelism is enabled; if set to false, nested parallelism is disabled. The behavior of the program is implementation defined if the value of OMP_NESTED is neither true nor false.
以下行应适用于bash:
The following line should work for bash:
export OMP_NESTED=true
Futhermore,如@HristoIliev在下面的注释中所述,很可能您想设置OMP_NUM_THREADS
环境变量来调整性能.引用标准:
Futhermore, as noted by @HristoIliev in the comment below, it's very likely that you want to set the OMP_NUM_THREADS
environment variable to tune performance. Quoting the standard:
此环境变量的值必须为正数列表 整数值.列表的值将线程数设置为 用于相应嵌套级别的并行区域.
The value of this environment variable must be a list of positive integer values. The values of the list set the number of threads to use for parallel regions at the corresponding nested levels.
这意味着应该设置与n,n-1
类似的OMP_NUM_THREADS
值,其中n
是CPU内核数.例如:
This means that one should set the value of OMP_NUM_THREADS
similar to n,n-1
where n
is the number of CPU cores. For instance:
export OMP_NUM_THREADS=8,7
对于8核系统(示例摘自下面的注释).
for an 8-core system (example copied from the comment below).
这篇关于用一个线程做一个部分,用多线程做一个for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!