了解openmp中的崩溃子句 [英] Understanding the collapse clause in openmp

查看:201
本文介绍了了解openmp中的崩溃子句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个带有崩溃子句的OpenMP代码,这对我来说是新的.我试图理解它的含义,但是我认为我没有完全理解它的含义.我发现的一个定义是:

I came across an OpenMP code that had the collapse clause, which was new to me. I'm trying to understand what it means, but I don't think I have fully grasped it's implications; One definition that I found is:

COLLAPSE :指定嵌套循环中应将多少个循环折叠成一个大的迭代空间并根据Schedule子句.所有关联循环中的迭代顺序执行确定了折叠的迭代空间中迭代的顺序.

COLLAPSE: Specifies how many loops in a nested loop should be collapsed into one large iteration space and divided according to the schedule clause. The sequential execution of the iterations in all associated loops determines the order of the iterations in the collapsed iteration space.

我以为我明白这意味着什么,所以我尝试了以下简单程序:

I thought I understood what that meant, so I tried the follwoing simple program:

int i, j;
#pragma omp parallel for num_threads(2) private(j)
for (i = 0; i < 4; i++)
    for (j = 0; j <= i; j++)
        printf("%d %d %d\n", i, j, omp_get_thread_num());

生产的

0 0 0
1 0 0
1 1 0
2 0 0
2 1 0
2 2 1
3 0 1
3 1 1
3 2 1
3 3 1

然后添加了collapse(2)子句.我希望在前两列中具有相同的结果,但在最后一列中具有相同数量的01. 但是我得到了

I then added the collapse(2) clause. I expected to have the same result in the first two columns but now have an equal number of 0's and 1's in the last column. But I got

0 0 0
1 0 0
2 0 1
3 0 1

所以我的问题是:

  1. 我的代码中发生了什么?
  2. 在什么情况下应该使用collapse?
  3. 您能否提供一个示例,说明使用collapse与不使用它之间的区别?
  1. What is happening in my code?
  2. Under what circumstances should I use collapse?
  3. Can you provide an example that shows the difference between using collapse and not using it?

推荐答案

您的代码的问题是内部循环的迭代取决于外部循环.根据有关绑定部分和collapse子句的描述下的OpenMP规范:

The problem with your code is that the iterations of the inner loop depend on the outer loop. According to the OpenMP specification under the description of the section on binding and the collapse clause:

如果任何关联循环的执行更改了用于计算任何关联值的任何值 迭代次数中,则行为是不确定的.

If execution of any associated loop changes any of the values used to compute any of the iteration counts, then the behavior is unspecified.

可以在没有这种情况的情况下使用折叠,例如使用方环

You can use collapse when this is not the case for example with a square loop

#pragma omp parallel for private(j) collapse(2)
for (i = 0; i < 4; i++)
    for (j = 0; j < 100; j++)

实际上,这是一个显示何时使用折叠的好例子.外循环只有四个迭代.如果您有四个以上的线程,那么将浪费一些线程.但是,当您合拢时,线程将在400次迭代之间分配,这可能比线程数大得多.使用崩溃的另一个原因是负载分布不均.如果只使用了四个迭代,而第四次迭代则花费了大部分时间,则其他线程将等待.但是,如果使用400次迭代,则负载可能会得到更好的分配.

In fact this is a good example to show when to use collapse. The outer loop only has four iterations. If you have more than four threads then some will be wasted. But when you collapse the threads will distribute among 400 iterations which is likely to be much greater than the number of threads. Another reason to use collapse is if the load is not well distributed. If you only used four iterations and the fourth iteration took most of the time the other threads wait. But if you use 400 iterations the load is likely to be better distributed.

您可以像这样手动为上面的代码融合一个循环

You can fuse a loop by hand for the code above like this

#pragma omp parallel for
for(int n=0; n<4*100; n++) {
    int i = n/100; int j=n%100;

此处是显示如何融合手工三重熔断环.

Here is an example showing how to fuse a triply fused loop by hand.

最后,此处是显示示例如何融合未定义collapse的三角形环.

Finally, here is an example showing how to fuse a triangular loop which collapse is not defined for.

这里是在OP问题中将矩形环映射到三角形环的解决方案.这可用于融合OP的三角环.

Here is a solution that maps a rectangular loop to the triangular loop in the OPs question. This can be used to fuse the OPs triangular loop.

//int n = 4;
for(int k=0; k<n*(n+1)/2; k++) {
    int i = k/(n+1), j = k%(n+1);
    if(j>i) i = n - i -1, j = n - j;
    printf("(%d,%d)\n", i,j);
}

这适用于任何n值.

OP问题的地图来自

(0,0),
(1,0), (1,1),
(2,0), (2,1), (2,2),
(3,0), (3,1), (3,2), (3,3),

(0,0), (3,3), (3,2), (3,1), (3,0),
(1,0), (1,1), (2,2), (2,1), (2,0),

对于n的奇数值,地图并非完全是矩形,但公式仍然有效.

For odd values of n the map is not exactly a rectangle but the formula still works.

例如n = 3从

(0,0),
(1,0), (1,1),
(2,0), (2,1), (2,2),

(0,0), (2,2), (2,1), (2,0),
(1,0), (1,1),

这是测试此代码的代码

#include <stdio.h>
int main(void) {
    int n = 4;
    for(int i=0; i<n; i++) {
        for(int j=0; j<=i; j++) {
            printf("(%d,%d)\n", i,j);
        }
    }
    puts("");
    for(int k=0; k<n*(n+1)/2; k++) {
        int i = k/(n+1), j = k%(n+1);
        if(j>i) i = n - i - 1, j = n - j;
        printf("(%d,%d)\n", i,j);
    }
}

这篇关于了解openmp中的崩溃子句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆