在基本do循环上发生gfortran openmp分段错误 [英] gfortran openmp segmentation fault occurs on basic do loop

查看:174
本文介绍了在基本do循环上发生gfortran openmp分段错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个将粒子分布到云单元网格中的程序。简单地遍历粒子总数(Ntot)并填充256 ^ 3网格(即每个粒子分布在8个单元格中)。

 %gfortran -fopenmp cic.f90 -o ./cic 

编译好的。但是当我运行它(./cic)时,出现了分段错误。我的循环是一个典型的omp do问题。这个程序在我没有在openmp中编译时工作。

 !$ omp parallel do 
do i = 1 (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0),则
dense(int(x1 i)),int(y1(i)),int(z1(i)))= dense(int(x1(i)),int(y1(i)),int(z1(i)))&
+ dx1(i)* dy1(i)* dz1(i)* mpart
end if

if(x2(i).le.Ng.and.y1( i).gt.0.and.z1(i).gt.0),那么
dense(int(x2(i)),int(y1(i)),int(z1(i)))= (int(x2(i)),int(y1(i)),int(z1(i)))&
+ dx2(i)* dy1(i)* dz1(i)* mpart
end if

if(x1(i).gt.0.and.y2( i).le.Ng.and.z1(i).gt.0)然后是
dense(int(x1(i)),int(y2(i)),int(z1(i)))= (int(x1(i)),int(y2(i)),int(z1(i)))&
+ dx1(i)* dy2(i)* dz1(i)* mpart
end if

if(x2(i).le.Ng.and.y2( i).le.Ng.and.z1(i).gt.0)然后是
dense(int(x2(i)),int(y2(i)),int(z1(i)))= (int(x2(i)),int(y2(i)),int(z1(i)))&
+ dx2(i)* dy2(i)* dz1(i)* mpart
end if

if(x1(i).gt.0.and.y1( i).gt.0.and.z2(i).le.Ng),那么
dense(int(x1(i)),int(y1(i)),int(z2(i)))= (int(x1(i)),int(y1(i)),int(z2(i)))&
+ dx1(i)* dy1(i)* dz2(i)* mpart
end if

if(x2(i).le.Ng.and.y1( i).gt.0.and.z2(i).le.Ng),那么
dense(int(x2(i)),int(y1(i)),int(z2(i)))= (int(x2(i)),int(y1(i)),int(z2(i)))&
+ dx2(i)* dy1(i)* dz2(i)* mpart
end if

if(x1(i).gt.0.and.y2( i).le.Ng.and.z2(i).le.Ng)然后是
dense(int(x1(i)),int(y2(i)),int(z2(i)))= (int(x1(i)),int(y2(i)),int(z2(i)))&
+ dx1(i)* dy2(i)* dz2(i)* mpart
end if

if(x2(i).le.Ng.and.y2( i).le.Ng.and.z2(i).le.Ng)然后是
dense(int(x2(i)),int(y2(i)),int(z2(i)))= (int(x2(i)),int(y2(i)),int(z2(i)))&
+ dx2(i)* dy2(i)* dz2(i)* mpart
end if
end do
!$ omp end parallel do

迭代之间没有依赖关系。想法?

解决方案

这个问题,以及你的其他问题,这是因为启用OpenMP时自动堆阵列被禁用。这意味着如果没有 -fopenmp ,大数组将自动置于静态存储中(称为 .bss 段)而小数组则分配在堆栈上。当您切换OpenMP支持时,不会使用自动静态分配,并且您的 dense 数组将被分配到例程的堆栈中。 OS X上的默认堆栈限制非常严格,因此存在分段错误。



这里有几个选项。第一种选择是通过给 SAVE 属性赋予 dense 静态分配。另一种选择是通过使它 ALLOCATABLE 然后使用 ALLOCATE 语句在堆上显式分配它,例如:

  REAL,DIMENSION(:,:,:),ALLOCATABLE :: dense 

ALLOCATE(dense (256,256,256))

!计算,计算和计算

DEALLOCATE(密集)

新的Fortran版本支持当它们超出作用域时,自动释放不含 SAVE 属性的数组。

注意,您的OpenMP指令只是罚款和没有额外的数据共享条款是必要的。您不需要在 PRIVATE 子句中声明 i ,因为循环计数器具有预定的私有数据共享类。您不需要将其他变量放在 SHARED 子句中,因为它们是隐式共享的。然而,您对 dense 进行的操作应该与 ATOMIC UPDATE (或简单地 ATOMIC ),或者应该使用 REDUCTION(+:dense)。原子更新被转换为锁定添加,并且不会产生大幅放缓,与循环内部具有条件的巨大放缓相比:

  INTEGER :: xi,yi,zi 

!$ OMP PARALLEL DO PRIVATE(xi,yi,zi)
...
if(x1(i).gt (i).gt.0.and.z1(i).gt.0),那么
xi = int(x1(i))
yi = int(y1(i ))
zi = int(z1(i))
!$ OMP原子更新
dense(xi,yi,zi)= dense(xi,yi,zi)&
+ dx1(i)* dy1(i)* dz1(i)* mpart
end if
...

对其他情况下的适当更改复制代码。如果您的编译器抱怨 ATOMIC 构造中的 UPDATE 子句,只需将其删除即可。



REDUCTION(+:dense)会在每个线程中创建一个 dense 这会消耗大量的内存,并且随着 dense 的大小,最终应用的减少量会增长得越来越慢。对于小型数组,它比原子更新更好。


I have a program which distributes particles into a cloud-in-cell mesh. Simply loops over the total number of particles (Ntot) and populates a 256^3 mesh (i.e. each particle gets distributed over 8 cells).

% gfortran -fopenmp cic.f90 -o ./cic

Which compiles fine. But when I run it (./cic) I get a segmentation fault. I my looping is a classic omp do problem. The program works when I don't compile it in openmp.

!$omp parallel do
 do i = 1,Ntot
   if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then
     dense(int(x1(i)),int(y1(i)),int(z1(i))) = dense(int(x1(i)),int(y1(i)),int(z1(i))) &
     + dx1(i) * dy1(i) * dz1(i) * mpart
   end if

   if (x2(i).le.Ng.and.y1(i).gt.0.and.z1(i).gt.0) then
     dense(int(x2(i)),int(y1(i)),int(z1(i))) = dense(int(x2(i)),int(y1(i)),int(z1(i))) &
     + dx2(i) * dy1(i) * dz1(i) * mpart
   end if

   if (x1(i).gt.0.and.y2(i).le.Ng.and.z1(i).gt.0) then
     dense(int(x1(i)),int(y2(i)),int(z1(i))) = dense(int(x1(i)),int(y2(i)),int(z1(i))) &
     + dx1(i) * dy2(i) * dz1(i) * mpart
   end if

   if (x2(i).le.Ng.and.y2(i).le.Ng.and.z1(i).gt.0) then
     dense(int(x2(i)),int(y2(i)),int(z1(i))) = dense(int(x2(i)),int(y2(i)),int(z1(i))) &
     + dx2(i) * dy2(i) * dz1(i) * mpart
   end if

   if (x1(i).gt.0.and.y1(i).gt.0.and.z2(i).le.Ng) then
     dense(int(x1(i)),int(y1(i)),int(z2(i))) = dense(int(x1(i)),int(y1(i)),int(z2(i))) &
     + dx1(i) * dy1(i) * dz2(i) * mpart
   end if

   if (x2(i).le.Ng.and.y1(i).gt.0.and.z2(i).le.Ng) then
     dense(int(x2(i)),int(y1(i)),int(z2(i))) = dense(int(x2(i)),int(y1(i)),int(z2(i))) &
     + dx2(i) * dy1(i) * dz2(i) * mpart
   end if

   if (x1(i).gt.0.and.y2(i).le.Ng.and.z2(i).le.Ng) then
     dense(int(x1(i)),int(y2(i)),int(z2(i))) = dense(int(x1(i)),int(y2(i)),int(z2(i))) &
     + dx1(i) * dy2(i) * dz2(i) * mpart
   end if

   if (x2(i).le.Ng.and.y2(i).le.Ng.and.z2(i).le.Ng) then
     dense(int(x2(i)),int(y2(i)),int(z2(i))) = dense(int(x2(i)),int(y2(i)),int(z2(i))) &
     +  dx2(i) * dy2(i) * dz2(i) * mpart
   end if
  end do
!$omp end parallel do

There are no dependencies between iterations. Ideas?

解决方案

This problem, as well as the one in your other question, comes from the fact that automatic heap arrays are disabled when OpenMP is enabled. This means that without -fopenmp, big arrays are automatically placed in the static storage (known as the .bss segment) while small arrays are allocated on the stack. When you switch OpenMP support on, no automatic static allocation is used and your dense arrays gets allocated on the stack of the routine. The default stack limits on OS X are very restrictive, hence the segmentation fault.

You have several options here. The first option is to make dense have static allocation by giving it the SAVE attribute. The other option is to explicitly allocate it on the heap by making it ALLOCATABLE and then using the ALLOCATE statement, e.g.:

REAL, DIMENSION(:,:,:), ALLOCATABLE :: dense

ALLOCATE(dense(256,256,256))

! Computations, computations, computations

DEALLOCATE(dense)

Newer Fortran versions support automatic deallocation of arrays without the SAVE attribute when they go out of scope.

Note that your OpenMP directive is just fine and no additional data sharing clauses are necessary. You do not need to declare i in a PRIVATE clause since loop counters have predetermined private data-sharing class. You do not need to put the other variables in SHARED clause as they are implicitly shared. Yet the operations that you do on dense should either be synchronised with ATOMIC UPDATE (or simply ATOMIC on older OpenMP implementations) or you should use REDUCTION(+:dense). Atomic updates are translated to locked additions and should not incur much of a slowdown, compared to the huge slowdown from having conditionals inside the loop:

INTEGER :: xi, yi, zi

!$OMP PARALLEL DO PRIVATE(xi,yi,zi)
...
if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then
  xi = int(x1(i))
  yi = int(y1(i))
  zi = int(z1(i))
  !$OMP ATOMIC UPDATE
  dense(xi,yi,zi) = dense(xi,yi,zi) &
                  + dx1(i) * dy1(i) * dz1(i) * mpart
end if
...

Replicate the code with the proper changes for the other cases. If your compiler complains about the UPDATE clause in the ATOMIC construct, simply delete it.

REDUCTION(+:dense) would create one copy of dense in each thread, which would consume a lot of memory and the reduction applied in the end would grow slower and slower with the size of dense. For small arrays it would work better than atomic updates.

这篇关于在基本do循环上发生gfortran openmp分段错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆