gfortran openmp 分段错误发生在基本 do 循环上 [英] gfortran openmp segmentation fault occurs on basic do loop
问题描述
我有一个程序可以将粒子分布到细胞内的云网格中.只需遍历粒子总数 (Ntot) 并填充 256^3 网格(即每个粒子分布在 8 个单元上).
I have a program which distributes particles into a cloud-in-cell mesh. Simply loops over the total number of particles (Ntot) and populates a 256^3 mesh (i.e. each particle gets distributed over 8 cells).
% gfortran -fopenmp cic.f90 -o ./cic
哪个编译得很好.但是当我运行它 (./cic) 时,我遇到了分段错误.我的循环是一个经典的 omp do 问题.当我不在 openmp 中编译该程序时,它可以工作.
Which compiles fine. But when I run it (./cic) I get a segmentation fault. I my looping is a classic omp do problem. The program works when I don't compile it in openmp.
!$omp parallel do
do i = 1,Ntot
if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then
dense(int(x1(i)),int(y1(i)),int(z1(i))) = dense(int(x1(i)),int(y1(i)),int(z1(i))) &
+ dx1(i) * dy1(i) * dz1(i) * mpart
end if
if (x2(i).le.Ng.and.y1(i).gt.0.and.z1(i).gt.0) then
dense(int(x2(i)),int(y1(i)),int(z1(i))) = dense(int(x2(i)),int(y1(i)),int(z1(i))) &
+ dx2(i) * dy1(i) * dz1(i) * mpart
end if
if (x1(i).gt.0.and.y2(i).le.Ng.and.z1(i).gt.0) then
dense(int(x1(i)),int(y2(i)),int(z1(i))) = dense(int(x1(i)),int(y2(i)),int(z1(i))) &
+ dx1(i) * dy2(i) * dz1(i) * mpart
end if
if (x2(i).le.Ng.and.y2(i).le.Ng.and.z1(i).gt.0) then
dense(int(x2(i)),int(y2(i)),int(z1(i))) = dense(int(x2(i)),int(y2(i)),int(z1(i))) &
+ dx2(i) * dy2(i) * dz1(i) * mpart
end if
if (x1(i).gt.0.and.y1(i).gt.0.and.z2(i).le.Ng) then
dense(int(x1(i)),int(y1(i)),int(z2(i))) = dense(int(x1(i)),int(y1(i)),int(z2(i))) &
+ dx1(i) * dy1(i) * dz2(i) * mpart
end if
if (x2(i).le.Ng.and.y1(i).gt.0.and.z2(i).le.Ng) then
dense(int(x2(i)),int(y1(i)),int(z2(i))) = dense(int(x2(i)),int(y1(i)),int(z2(i))) &
+ dx2(i) * dy1(i) * dz2(i) * mpart
end if
if (x1(i).gt.0.and.y2(i).le.Ng.and.z2(i).le.Ng) then
dense(int(x1(i)),int(y2(i)),int(z2(i))) = dense(int(x1(i)),int(y2(i)),int(z2(i))) &
+ dx1(i) * dy2(i) * dz2(i) * mpart
end if
if (x2(i).le.Ng.and.y2(i).le.Ng.and.z2(i).le.Ng) then
dense(int(x2(i)),int(y2(i)),int(z2(i))) = dense(int(x2(i)),int(y2(i)),int(z2(i))) &
+ dx2(i) * dy2(i) * dz2(i) * mpart
end if
end do
!$omp end parallel do
迭代之间没有依赖关系.想法?
There are no dependencies between iterations. Ideas?
推荐答案
这个问题,还有您的另一个问题 来自启用 OpenMP 时禁用自动堆数组的事实.这意味着如果没有 -fopenmp
,大数组会自动放置在静态存储中(称为 .bss
段),而小数组则分配在堆栈中.当您打开 OpenMP 支持时,不会使用自动静态分配,并且您的 dense
数组将在例程的堆栈上分配.OS X 上的默认堆栈限制非常严格,因此会出现分段错误.
This problem, as well as the one in your other question, comes from the fact that automatic heap arrays are disabled when OpenMP is enabled. This means that without -fopenmp
, big arrays are automatically placed in the static storage (known as the .bss
segment) while small arrays are allocated on the stack. When you switch OpenMP support on, no automatic static allocation is used and your dense
arrays gets allocated on the stack of the routine. The default stack limits on OS X are very restrictive, hence the segmentation fault.
您有多种选择.第一个选项是通过为 dense
赋予 SAVE
属性来使 dense
具有静态分配.另一种选择是通过将其设置为 ALLOCATABLE
然后使用 ALLOCATE
语句在堆上显式分配它,例如:
You have several options here. The first option is to make dense
have static allocation by giving it the SAVE
attribute. The other option is to explicitly allocate it on the heap by making it ALLOCATABLE
and then using the ALLOCATE
statement, e.g.:
REAL, DIMENSION(:,:,:), ALLOCATABLE :: dense
ALLOCATE(dense(256,256,256))
! Computations, computations, computations
DEALLOCATE(dense)
较新的 Fortran 版本支持在数组超出范围时自动取消分配没有 SAVE
属性的数组.
Newer Fortran versions support automatic deallocation of arrays without the SAVE
attribute when they go out of scope.
请注意,您的 OpenMP 指令很好,不需要额外的数据共享子句.您不需要在 PRIVATE
子句中声明 i
,因为循环计数器具有预先确定的私有数据共享类.您不需要将其他变量放在 SHARED
子句中,因为它们是隐式共享的.然而,您在 dense
上执行的操作应该与 ATOMIC UPDATE
(或在旧的 OpenMP 实现上简单地使用 ATOMIC
)同步,或者您应该使用 <代码>减少(+:密集)代码>.原子更新被转换为锁定的添加,与循环内的条件造成的巨大减速相比,不应导致太大的减速:
Note that your OpenMP directive is just fine and no additional data sharing clauses are necessary. You do not need to declare i
in a PRIVATE
clause since loop counters have predetermined private data-sharing class. You do not need to put the other variables in SHARED
clause as they are implicitly shared. Yet the operations that you do on dense
should either be synchronised with ATOMIC UPDATE
(or simply ATOMIC
on older OpenMP implementations) or you should use REDUCTION(+:dense)
. Atomic updates are translated to locked additions and should not incur much of a slowdown, compared to the huge slowdown from having conditionals inside the loop:
INTEGER :: xi, yi, zi
!$OMP PARALLEL DO PRIVATE(xi,yi,zi)
...
if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then
xi = int(x1(i))
yi = int(y1(i))
zi = int(z1(i))
!$OMP ATOMIC UPDATE
dense(xi,yi,zi) = dense(xi,yi,zi) &
+ dx1(i) * dy1(i) * dz1(i) * mpart
end if
...
针对其他情况复制代码并进行适当的更改.如果您的编译器抱怨 ATOMIC
结构中的 UPDATE
子句,只需将其删除即可.
Replicate the code with the proper changes for the other cases. If your compiler complains about the UPDATE
clause in the ATOMIC
construct, simply delete it.
REDUCTION(+:dense)
会在每个线程中创建一个 dense
的副本,这会消耗大量内存并且最后应用的减少会变慢并且随着 dense
的大小而变慢.对于小数组,它比原子更新更有效.
REDUCTION(+:dense)
would create one copy of dense
in each thread, which would consume a lot of memory and the reduction applied in the end would grow slower and slower with the size of dense
. For small arrays it would work better than atomic updates.
这篇关于gfortran openmp 分段错误发生在基本 do 循环上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!