并行化Fortran 2008系统,可能会使用openmp [英] Parallelizing fortran 2008 `do concurrent` systematically, possibly with openmp
问题描述
并发
结构是一个do循环,它告诉编译器没有任何迭代会影响其他迭代。 一个有效的例子:
程序main
implicit none
integer :: i
integer,dimension(10):: array
并发(i = 1:10)
数组(i) = i
end do
end program main
可以完成迭代以任何顺序。你可以阅读更多关于它这里。
据我所知,gfortran不会自动并行处理这些 do concurrent
循环,而我记得gfortran-diffusion-list (此处)。它只是将它们转换为经典的 do
循环。 我的问题:你知道吗一种系统并行
自动执行它并不容易。 DO CONCURRENT
构造有一个 forall-header ,这意味着它可以接受多个循环,索引变量定义和掩码。基本上,您需要替换:
DO CONCURRENT([< spec-spec> ::]< forall-triplet- spec 1>,< forall-triplet-spec 2>,... [,< scalar-mask-expression>])
< block>
END DO
with:
[BLOCK
< type-spec> ::< indexes>]
!$ omp parallel do
DO< forall-triplet-spec 1>
DO< forall-triplet-spec 2>
...
[IF(< scalar-mask-expression>)THEN]
< block>
[END IF]
...
END DO
END DO
!$ omp end parallel do
[END BLOCK]
$ b (方括号内的内容是可选的,基于中的相应部分的存在注意,这不会像用< iters 1> *<< / c> iters 2> * ...
独立迭代,这是 DO CONCURRENT
预计要做的事情。还要注意, forall-header 允许一个 type-spec 允许在头文件中定义循环索引,并且你需要围绕 BLOCK ... END BLOCK
构造来保留语义。您还需要检查 scalar-mask-expr 是否存在于 forall-header 的末尾,如果存在,您还应该将 IF ... END IF
在最内层循环中。
如果您只在 DO CONCURRENT
您也可以将它转换为 FORALL
并使用工作共享
OpenMP指令。这将比上述更容易。
继续并发< forall-header>
< block>
END DO
会变成:
!$ omp parallel workshare
FORALL< forall-header>
< block>
END FORALL
!$ omp end parallel workshare
鉴于以上所述,我唯一可以考虑的系统方法就是系统地通过你的源代码来搜索 DO CONCURRENT
和 根据 forall-header 和循环体的内容,用上面转换的结构之一替换它。
编辑: OpenMP workshare
指令的用法目前不鼓励。事实证明,至少英特尔Fortran编译器和GCC serialise FORALL
语句和构造在OpenMP workshare
指令中在编译期间OpenMP single
指令不会带来任何加速。其他编译器可能会以不同方式实现,但如果要实现便携性能,最好避免使用它。
The fortran 2008 do concurrent
construct is a do loop that tells the compiler that no iteration affect any other. It can thus be parallelized safely.
A valid example:
program main
implicit none
integer :: i
integer, dimension(10) :: array
do concurrent( i= 1: 10)
array(i) = i
end do
end program main
where iterations can be done in any order. You can read more about it here.
To my knowledge, gfortran does not automatically parallelize these do concurrent
loops, while I remember a gfortran-diffusion-list mail about doing it (here). It justs transform them to classical do
loops.
My question: Do you know a way to systematically parallelize do concurrent
loops? For instance with a systematic openmp syntax?
解决方案 It is not that easy to do it automatically. The DO CONCURRENT
construct has a forall-header which means that it could accept multiple loops, index variables definition and a mask. Basically, you need to replace:
DO CONCURRENT([<type-spec> :: ]<forall-triplet-spec 1>, <forall-triplet-spec 2>, ...[, <scalar-mask-expression>])
<block>
END DO
with:
[BLOCK
<type-spec> :: <indexes>]
!$omp parallel do
DO <forall-triplet-spec 1>
DO <forall-triplet-spec 2>
...
[IF (<scalar-mask-expression>) THEN]
<block>
[END IF]
...
END DO
END DO
!$omp end parallel do
[END BLOCK]
(things in square brackets are optional, based on the presence of the corresponding parts in the forall-header)
Note that this would not be as effective as parallelising one big loop with <iters 1>*<iters 2>*...
independent iterations which is what DO CONCURRENT
is expected to do. Note also that forall-header permits a type-spec that allows one to define loop indexes inside the header and you will need to surround the whole thing in BLOCK ... END BLOCK
construct to preserve the semantics. You would also need to check if scalar-mask-expr exists at the end of the forall-header and if it does you should also put that IF ... END IF
inside the innermost loop.
If you only have array assignments inside the body of the DO CONCURRENT
you would could also transform it into FORALL
and use the workshare
OpenMP directive. It would be much easier than the above.
DO CONCURRENT <forall-header>
<block>
END DO
would become:
!$omp parallel workshare
FORALL <forall-header>
<block>
END FORALL
!$omp end parallel workshare
Given all the above, the only systematic way that I can think about is to systematically go through your source code, searching for DO CONCURRENT
and systematically replacing it with one of the above transformed constructs based on the content of the forall-header and the loop body.
Edit: Usage of OpenMP workshare
directive is currently discouraged. It turns out that at least Intel Fortran Compiler and GCC serialise FORALL
statements and constructs inside OpenMP workshare
directives by surrounding them with OpenMP single
directive during compilation which brings no speedup whatsoever. Other compilers might implement it differently but it's better to avoid its usage if portable performance is to be achieved.
这篇关于并行化Fortran 2008系统,可能会使用openmp的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!