系统地并行化 fortran 2008 `do concurrent`,可能使用 openmp [英] Parallelizing fortran 2008 `do concurrent` systematically, possibly with openmp
问题描述
fortran 2008 do concurrent
构造是一个 do 循环,它告诉编译器没有迭代影响任何其他循环.因此可以安全地并行化.
一个有效的例子:
程序主隐式无整数 :: i整数,维度(10):: 数组做并发(我= 1:10)数组(i) = i结束做结束程序主
可以以任何顺序进行迭代.您可以阅读有关它的更多信息 rlwhbh2rlwh2afwh2afgwh2afwh2afgwh2afwhdrlwh2a2fg/p> 据我所知,gfortran 不会自动并行化这些 我的问题:你知道一种系统并行化 自动完成这件事并不容易. 与: (方括号中的内容是可选的,基于forall-header中相应部分的存在) 请注意,这不会像使用 如果您在 会变成: 鉴于上述所有内容,我能想到的唯一系统方法是系统浏览您的源代码,搜索 编辑:目前不鼓励使用 OpenMP The fortran 2008 A valid example: where iterations can be done in any order. You can read more about it here. To my knowledge, gfortran does not automatically parallelize these My question: Do you know a way to systematically parallelize It is not that easy to do it automatically. The with: (things in square brackets are optional, based on the presence of the corresponding parts in the forall-header) Note that this would not be as effective as parallelising one big loop with If you only have array assignments inside the body of the would become: Given all the above, the only systematic way that I can think about is to systematically go through your source code, searching for Edit: Usage of OpenMP 这篇关于系统地并行化 fortran 2008 `do concurrent`,可能使用 openmp的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!do concurrent
循环,而我记得关于这样做的 gfortran-diffusion-list 邮件(此处).它只是将它们转换为经典的 do
循环.并发
循环的方法吗?例如使用系统 openmp 语法?DO CONCURRENT
结构有一个 forall-header,这意味着它可以接受多个循环、索引变量定义和掩码.基本上,您需要替换:DO CONCURRENT([
[块<类型规范>:: <索引>]!$omp 并行做DO<forall-triplet-spec 1>DO<forall-triplet-spec 2>...[IF (
独立迭代并行化一个大循环那样有效,这就是 DO CONCURRENT
预计会这样做.另请注意,forall-header 允许使用 type-spec 允许在标头内定义循环索引,并且您需要将整个内容包含在 BLOCK 中... END BLOCK
构造以保留语义.您还需要检查 scalar-mask-expr 是否存在于 forall-header 的末尾,如果存在,您还应该将 IF ... END IF
在最内层循环中.DO CONCURRENT
的主体内只有数组分配,您还可以将其转换为 FORALL
并使用 workshare
OpenMP指示.会比上面的简单多了.DO CONCURRENT
!$omp 并行工作共享FORALL <forall-header><块>完结!$omp 结束并行工作共享
DO CONCURRENT
并系统地根据forall-header和循环体的内容,用上述转换结构之一替换它.workshare
指令.事实证明,至少英特尔 Fortran 编译器和 GCC 序列化 FORALL
语句并在 OpenMP workshare
指令中构造,在编译期间用 OpenMP single
指令包围它们这不会带来任何加速.其他编译器可能会以不同的方式实现它,但如果要实现可移植的性能,最好避免使用它.do concurrent
construct is a do loop that tells the compiler that no iteration affect any other. It can thus be parallelized safely.program main
implicit none
integer :: i
integer, dimension(10) :: array
do concurrent( i= 1: 10)
array(i) = i
end do
end program main
do concurrent
loops, while I remember a gfortran-diffusion-list mail about doing it (here). It justs transform them to classical do
loops.do concurrent
loops? For instance with a systematic openmp syntax?DO CONCURRENT
construct has a forall-header which means that it could accept multiple loops, index variables definition and a mask. Basically, you need to replace:DO CONCURRENT([<type-spec> :: ]<forall-triplet-spec 1>, <forall-triplet-spec 2>, ...[, <scalar-mask-expression>])
<block>
END DO
[BLOCK
<type-spec> :: <indexes>]
!$omp parallel do
DO <forall-triplet-spec 1>
DO <forall-triplet-spec 2>
...
[IF (<scalar-mask-expression>) THEN]
<block>
[END IF]
...
END DO
END DO
!$omp end parallel do
[END BLOCK]
<iters 1>*<iters 2>*...
independent iterations which is what DO CONCURRENT
is expected to do. Note also that forall-header permits a type-spec that allows one to define loop indexes inside the header and you will need to surround the whole thing in BLOCK ... END BLOCK
construct to preserve the semantics. You would also need to check if scalar-mask-expr exists at the end of the forall-header and if it does you should also put that IF ... END IF
inside the innermost loop.DO CONCURRENT
you would could also transform it into FORALL
and use the workshare
OpenMP directive. It would be much easier than the above.DO CONCURRENT <forall-header>
<block>
END DO
!$omp parallel workshare
FORALL <forall-header>
<block>
END FORALL
!$omp end parallel workshare
DO CONCURRENT
and systematically replacing it with one of the above transformed constructs based on the content of the forall-header and the loop body.workshare
directive is currently discouraged. It turns out that at least Intel Fortran Compiler and GCC serialise FORALL
statements and constructs inside OpenMP workshare
directives by surrounding them with OpenMP single
directive during compilation which brings no speedup whatsoever. Other compilers might implement it differently but it's better to avoid its usage if portable performance is to be achieved.