在基本do循环上发生gfortran openmp分段错误 [英] gfortran openmp segmentation fault occurs on basic do loop
问题描述
我有一个将粒子分布到云单元网格中的程序。简单地遍历粒子总数(Ntot)并填充256 ^ 3网格(即每个粒子分布在8个单元格中)。
%gfortran -fopenmp cic.f90 -o ./cic
编译好的。但是当我运行它(./cic)时,出现了分段错误。我的循环是一个典型的omp do问题。这个程序在我没有在openmp中编译时工作。
!$ omp parallel do
do i = 1 (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0),则
dense(int(x1 i)),int(y1(i)),int(z1(i)))= dense(int(x1(i)),int(y1(i)),int(z1(i)))&
+ dx1(i)* dy1(i)* dz1(i)* mpart
end if
if(x2(i).le.Ng.and.y1( i).gt.0.and.z1(i).gt.0),那么
dense(int(x2(i)),int(y1(i)),int(z1(i)))= (int(x2(i)),int(y1(i)),int(z1(i)))&
+ dx2(i)* dy1(i)* dz1(i)* mpart
end if
if(x1(i).gt.0.and.y2( i).le.Ng.and.z1(i).gt.0)然后是
dense(int(x1(i)),int(y2(i)),int(z1(i)))= (int(x1(i)),int(y2(i)),int(z1(i)))&
+ dx1(i)* dy2(i)* dz1(i)* mpart
end if
if(x2(i).le.Ng.and.y2( i).le.Ng.and.z1(i).gt.0)然后是
dense(int(x2(i)),int(y2(i)),int(z1(i)))= (int(x2(i)),int(y2(i)),int(z1(i)))&
+ dx2(i)* dy2(i)* dz1(i)* mpart
end if
if(x1(i).gt.0.and.y1( i).gt.0.and.z2(i).le.Ng),那么
dense(int(x1(i)),int(y1(i)),int(z2(i)))= (int(x1(i)),int(y1(i)),int(z2(i)))&
+ dx1(i)* dy1(i)* dz2(i)* mpart
end if
if(x2(i).le.Ng.and.y1( i).gt.0.and.z2(i).le.Ng),那么
dense(int(x2(i)),int(y1(i)),int(z2(i)))= (int(x2(i)),int(y1(i)),int(z2(i)))&
+ dx2(i)* dy1(i)* dz2(i)* mpart
end if
if(x1(i).gt.0.and.y2( i).le.Ng.and.z2(i).le.Ng)然后是
dense(int(x1(i)),int(y2(i)),int(z2(i)))= (int(x1(i)),int(y2(i)),int(z2(i)))&
+ dx1(i)* dy2(i)* dz2(i)* mpart
end if
if(x2(i).le.Ng.and.y2( i).le.Ng.and.z2(i).le.Ng)然后是
dense(int(x2(i)),int(y2(i)),int(z2(i)))= (int(x2(i)),int(y2(i)),int(z2(i)))&
+ dx2(i)* dy2(i)* dz2(i)* mpart
end if
end do
!$ omp end parallel do
迭代之间没有依赖关系。想法?
这个问题,以及你的其他问题,这是因为启用OpenMP时自动堆阵列被禁用。这意味着如果没有 -fopenmp
,大数组将自动置于静态存储中(称为 .bss
段)而小数组则分配在堆栈上。当您切换OpenMP支持时,不会使用自动静态分配,并且您的 dense
数组将被分配到例程的堆栈中。 OS X上的默认堆栈限制非常严格,因此存在分段错误。
这里有几个选项。第一种选择是通过给 dense
静态分配。另一种选择是通过使它 ALLOCATABLE
然后使用 ALLOCATE
语句在堆上显式分配它,例如:
REAL,DIMENSION(:,:,:),ALLOCATABLE :: dense
ALLOCATE(dense (256,256,256))
!计算,计算和计算
DEALLOCATE(密集)
新的Fortran版本支持当它们超出作用域时,自动释放不含 SAVE
属性的数组。
PRIVATE
子句中声明 i
,因为循环计数器具有预定的私有数据共享类。您不需要将其他变量放在 SHARED
子句中,因为它们是隐式共享的。然而,您对 dense
进行的操作应该与 ATOMIC UPDATE
(或简单地 ATOMIC
),或者应该使用 REDUCTION(+:dense)
。原子更新被转换为锁定添加,并且不会产生大幅放缓,与循环内部具有条件的巨大放缓相比: INTEGER :: xi,yi,zi
!$ OMP PARALLEL DO PRIVATE(xi,yi,zi)
...
if(x1(i).gt (i).gt.0.and.z1(i).gt.0),那么
xi = int(x1(i))
yi = int(y1(i ))
zi = int(z1(i))
!$ OMP原子更新
dense(xi,yi,zi)= dense(xi,yi,zi)&
+ dx1(i)* dy1(i)* dz1(i)* mpart
end if
...
对其他情况下的适当更改复制代码。如果您的编译器抱怨 ATOMIC
构造中的 UPDATE
子句,只需将其删除即可。
REDUCTION(+:dense)
会在每个线程中创建一个 dense
这会消耗大量的内存,并且随着 dense
的大小,最终应用的减少量会增长得越来越慢。对于小型数组,它比原子更新更好。
I have a program which distributes particles into a cloud-in-cell mesh. Simply loops over the total number of particles (Ntot) and populates a 256^3 mesh (i.e. each particle gets distributed over 8 cells).
% gfortran -fopenmp cic.f90 -o ./cic
Which compiles fine. But when I run it (./cic) I get a segmentation fault. I my looping is a classic omp do problem. The program works when I don't compile it in openmp.
!$omp parallel do
do i = 1,Ntot
if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then
dense(int(x1(i)),int(y1(i)),int(z1(i))) = dense(int(x1(i)),int(y1(i)),int(z1(i))) &
+ dx1(i) * dy1(i) * dz1(i) * mpart
end if
if (x2(i).le.Ng.and.y1(i).gt.0.and.z1(i).gt.0) then
dense(int(x2(i)),int(y1(i)),int(z1(i))) = dense(int(x2(i)),int(y1(i)),int(z1(i))) &
+ dx2(i) * dy1(i) * dz1(i) * mpart
end if
if (x1(i).gt.0.and.y2(i).le.Ng.and.z1(i).gt.0) then
dense(int(x1(i)),int(y2(i)),int(z1(i))) = dense(int(x1(i)),int(y2(i)),int(z1(i))) &
+ dx1(i) * dy2(i) * dz1(i) * mpart
end if
if (x2(i).le.Ng.and.y2(i).le.Ng.and.z1(i).gt.0) then
dense(int(x2(i)),int(y2(i)),int(z1(i))) = dense(int(x2(i)),int(y2(i)),int(z1(i))) &
+ dx2(i) * dy2(i) * dz1(i) * mpart
end if
if (x1(i).gt.0.and.y1(i).gt.0.and.z2(i).le.Ng) then
dense(int(x1(i)),int(y1(i)),int(z2(i))) = dense(int(x1(i)),int(y1(i)),int(z2(i))) &
+ dx1(i) * dy1(i) * dz2(i) * mpart
end if
if (x2(i).le.Ng.and.y1(i).gt.0.and.z2(i).le.Ng) then
dense(int(x2(i)),int(y1(i)),int(z2(i))) = dense(int(x2(i)),int(y1(i)),int(z2(i))) &
+ dx2(i) * dy1(i) * dz2(i) * mpart
end if
if (x1(i).gt.0.and.y2(i).le.Ng.and.z2(i).le.Ng) then
dense(int(x1(i)),int(y2(i)),int(z2(i))) = dense(int(x1(i)),int(y2(i)),int(z2(i))) &
+ dx1(i) * dy2(i) * dz2(i) * mpart
end if
if (x2(i).le.Ng.and.y2(i).le.Ng.and.z2(i).le.Ng) then
dense(int(x2(i)),int(y2(i)),int(z2(i))) = dense(int(x2(i)),int(y2(i)),int(z2(i))) &
+ dx2(i) * dy2(i) * dz2(i) * mpart
end if
end do
!$omp end parallel do
There are no dependencies between iterations. Ideas?
This problem, as well as the one in your other question, comes from the fact that automatic heap arrays are disabled when OpenMP is enabled. This means that without -fopenmp
, big arrays are automatically placed in the static storage (known as the .bss
segment) while small arrays are allocated on the stack. When you switch OpenMP support on, no automatic static allocation is used and your dense
arrays gets allocated on the stack of the routine. The default stack limits on OS X are very restrictive, hence the segmentation fault.
You have several options here. The first option is to make dense
have static allocation by giving it the SAVE
attribute. The other option is to explicitly allocate it on the heap by making it ALLOCATABLE
and then using the ALLOCATE
statement, e.g.:
REAL, DIMENSION(:,:,:), ALLOCATABLE :: dense
ALLOCATE(dense(256,256,256))
! Computations, computations, computations
DEALLOCATE(dense)
Newer Fortran versions support automatic deallocation of arrays without the SAVE
attribute when they go out of scope.
Note that your OpenMP directive is just fine and no additional data sharing clauses are necessary. You do not need to declare i
in a PRIVATE
clause since loop counters have predetermined private data-sharing class. You do not need to put the other variables in SHARED
clause as they are implicitly shared. Yet the operations that you do on dense
should either be synchronised with ATOMIC UPDATE
(or simply ATOMIC
on older OpenMP implementations) or you should use REDUCTION(+:dense)
. Atomic updates are translated to locked additions and should not incur much of a slowdown, compared to the huge slowdown from having conditionals inside the loop:
INTEGER :: xi, yi, zi
!$OMP PARALLEL DO PRIVATE(xi,yi,zi)
...
if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then
xi = int(x1(i))
yi = int(y1(i))
zi = int(z1(i))
!$OMP ATOMIC UPDATE
dense(xi,yi,zi) = dense(xi,yi,zi) &
+ dx1(i) * dy1(i) * dz1(i) * mpart
end if
...
Replicate the code with the proper changes for the other cases. If your compiler complains about the UPDATE
clause in the ATOMIC
construct, simply delete it.
REDUCTION(+:dense)
would create one copy of dense
in each thread, which would consume a lot of memory and the reduction applied in the end would grow slower and slower with the size of dense
. For small arrays it would work better than atomic updates.
这篇关于在基本do循环上发生gfortran openmp分段错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!