系统地并行化 fortran 2008 `do concurrent`,可能使用 openmp [英] Parallelizing fortran 2008 `do concurrent` systematically, possibly with openmp

查看:65
本文介绍了系统地并行化 fortran 2008 `do concurrent`,可能使用 openmp的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

fortran 2008 do concurrent 构造是一个 do 循环,它告诉编译器没有迭代影响任何其他循环.因此可以安全地并行化.

一个有效的例子:

程序主隐式无整数 :: i整数,维度(10):: 数组做并发(我= 1:10)数组(i) = i结束做结束程序主

可以以任何顺序进行迭代.您可以阅读有关它的更多信息 rlwhbh2rlwh2afwh2afgwh2afwh2afgwh2afwhdrlwh2a2fg/p>

据我所知,gfortran 不会自动并行化这些 do concurrent 循环,而我记得关于这样做的 gfortran-diffusion-list 邮件(此处).它只是将它们转换为经典的 do 循环.

我的问题:你知道一种系统并行化并发循环的方法吗?例如使用系统 openmp 语法?

解决方案

自动完成这件事并不容易.DO CONCURRENT 结构有一个 forall-header,这意味着它可以接受多个循环、索引变量定义和掩码.基本上,您需要替换:

DO CONCURRENT([::], , ...[, ])<块>结束做

与:

[块<类型规范>:: <索引>]!$omp 并行做DO<forall-triplet-spec 1>DO<forall-triplet-spec 2>...[IF () THEN]<块>[万一]...结束做结束做!$omp end parallel do[结束块]

(方括号中的内容是可选的,基于forall-header中相应部分的存在)

请注意,这不会像使用 **... 独立迭代并行化一个大循环那样有效,这就是 DO CONCURRENT 预计会这样做.另请注意,forall-header 允许使用 type-spec 允许在标头内定义循环索引,并且您需要将整个内容包含在 BLOCK 中... END BLOCK 构造以保留语义.您还需要检查 scalar-mask-expr 是否存在于 forall-header 的末尾,如果存在,您还应该将 IF ... END IF 在最内层循环中.

如果您在 DO CONCURRENT 的主体内只有数组分配,您还可以将其转换为 FORALL 并使用 workshare OpenMP指示.会比上面的简单多了.

DO CONCURRENT <块>结束做

会变成:

!$omp 并行工作共享FORALL <forall-header><块>完结!$omp 结束并行工作共享

鉴于上述所有内容,我能想到的唯一系统方法是系统浏览您的源代码,搜索DO CONCURRENT系统地根据forall-header和循环体的内容,用上述转换结构之一替换它.

编辑:目前不鼓励使用 OpenMP workshare 指令.事实证明,至少英特尔 Fortran 编译器和 GCC 序列化 FORALL 语句并在 OpenMP workshare 指令中构造,在编译期间用 OpenMP single 指令包围它们这不会带来任何加速.其他编译器可能会以不同的方式实现它,但如果要实现可移植的性能,最好避免使用它.

The fortran 2008 do concurrent construct is a do loop that tells the compiler that no iteration affect any other. It can thus be parallelized safely.

A valid example:

program main
  implicit none
  integer :: i
  integer, dimension(10) :: array
  do concurrent( i= 1: 10)
    array(i) = i
  end do
end program main

where iterations can be done in any order. You can read more about it here.

To my knowledge, gfortran does not automatically parallelize these do concurrent loops, while I remember a gfortran-diffusion-list mail about doing it (here). It justs transform them to classical do loops.

My question: Do you know a way to systematically parallelize do concurrent loops? For instance with a systematic openmp syntax?

解决方案

It is not that easy to do it automatically. The DO CONCURRENT construct has a forall-header which means that it could accept multiple loops, index variables definition and a mask. Basically, you need to replace:

DO CONCURRENT([<type-spec> :: ]<forall-triplet-spec 1>, <forall-triplet-spec 2>, ...[, <scalar-mask-expression>])
  <block>
END DO

with:

[BLOCK
    <type-spec> :: <indexes>]

!$omp parallel do
DO <forall-triplet-spec 1>
  DO <forall-triplet-spec 2>
    ...
    [IF (<scalar-mask-expression>) THEN]
      <block>
    [END IF]
    ...
  END DO
END DO
!$omp end parallel do

[END BLOCK]

(things in square brackets are optional, based on the presence of the corresponding parts in the forall-header)

Note that this would not be as effective as parallelising one big loop with <iters 1>*<iters 2>*... independent iterations which is what DO CONCURRENT is expected to do. Note also that forall-header permits a type-spec that allows one to define loop indexes inside the header and you will need to surround the whole thing in BLOCK ... END BLOCK construct to preserve the semantics. You would also need to check if scalar-mask-expr exists at the end of the forall-header and if it does you should also put that IF ... END IF inside the innermost loop.

If you only have array assignments inside the body of the DO CONCURRENT you would could also transform it into FORALL and use the workshare OpenMP directive. It would be much easier than the above.

DO CONCURRENT <forall-header>
  <block>
END DO

would become:

!$omp parallel workshare
FORALL <forall-header>
  <block>
END FORALL
!$omp end parallel workshare

Given all the above, the only systematic way that I can think about is to systematically go through your source code, searching for DO CONCURRENT and systematically replacing it with one of the above transformed constructs based on the content of the forall-header and the loop body.

Edit: Usage of OpenMP workshare directive is currently discouraged. It turns out that at least Intel Fortran Compiler and GCC serialise FORALL statements and constructs inside OpenMP workshare directives by surrounding them with OpenMP single directive during compilation which brings no speedup whatsoever. Other compilers might implement it differently but it's better to avoid its usage if portable performance is to be achieved.

这篇关于系统地并行化 fortran 2008 `do concurrent`,可能使用 openmp的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆