gfortran openmp 分段错误发生在基本 do 循环上 [英] gfortran openmp segmentation fault occurs on basic do loop

查看:37
本文介绍了gfortran openmp 分段错误发生在基本 do 循环上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序可以将粒子分布到细胞内的云网格中.只需遍历粒子总数 (Ntot) 并填充 256^3 网格(即每个粒子分布在 8 个单元上).

I have a program which distributes particles into a cloud-in-cell mesh. Simply loops over the total number of particles (Ntot) and populates a 256^3 mesh (i.e. each particle gets distributed over 8 cells).

% gfortran -fopenmp cic.f90 -o ./cic

哪个编译得很好.但是当我运行它 (./cic) 时,我遇到了分段错误.我的循环是一个经典的 omp do 问题.当我不在 openmp 中编译该程序时,它可以工作.

Which compiles fine. But when I run it (./cic) I get a segmentation fault. I my looping is a classic omp do problem. The program works when I don't compile it in openmp.

!$omp parallel do
 do i = 1,Ntot
   if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then
     dense(int(x1(i)),int(y1(i)),int(z1(i))) = dense(int(x1(i)),int(y1(i)),int(z1(i))) &
     + dx1(i) * dy1(i) * dz1(i) * mpart
   end if

   if (x2(i).le.Ng.and.y1(i).gt.0.and.z1(i).gt.0) then
     dense(int(x2(i)),int(y1(i)),int(z1(i))) = dense(int(x2(i)),int(y1(i)),int(z1(i))) &
     + dx2(i) * dy1(i) * dz1(i) * mpart
   end if

   if (x1(i).gt.0.and.y2(i).le.Ng.and.z1(i).gt.0) then
     dense(int(x1(i)),int(y2(i)),int(z1(i))) = dense(int(x1(i)),int(y2(i)),int(z1(i))) &
     + dx1(i) * dy2(i) * dz1(i) * mpart
   end if

   if (x2(i).le.Ng.and.y2(i).le.Ng.and.z1(i).gt.0) then
     dense(int(x2(i)),int(y2(i)),int(z1(i))) = dense(int(x2(i)),int(y2(i)),int(z1(i))) &
     + dx2(i) * dy2(i) * dz1(i) * mpart
   end if

   if (x1(i).gt.0.and.y1(i).gt.0.and.z2(i).le.Ng) then
     dense(int(x1(i)),int(y1(i)),int(z2(i))) = dense(int(x1(i)),int(y1(i)),int(z2(i))) &
     + dx1(i) * dy1(i) * dz2(i) * mpart
   end if

   if (x2(i).le.Ng.and.y1(i).gt.0.and.z2(i).le.Ng) then
     dense(int(x2(i)),int(y1(i)),int(z2(i))) = dense(int(x2(i)),int(y1(i)),int(z2(i))) &
     + dx2(i) * dy1(i) * dz2(i) * mpart
   end if

   if (x1(i).gt.0.and.y2(i).le.Ng.and.z2(i).le.Ng) then
     dense(int(x1(i)),int(y2(i)),int(z2(i))) = dense(int(x1(i)),int(y2(i)),int(z2(i))) &
     + dx1(i) * dy2(i) * dz2(i) * mpart
   end if

   if (x2(i).le.Ng.and.y2(i).le.Ng.and.z2(i).le.Ng) then
     dense(int(x2(i)),int(y2(i)),int(z2(i))) = dense(int(x2(i)),int(y2(i)),int(z2(i))) &
     +  dx2(i) * dy2(i) * dz2(i) * mpart
   end if
  end do
!$omp end parallel do

迭代之间没有依赖关系.想法?

There are no dependencies between iterations. Ideas?

推荐答案

这个问题,还有您的另一个问题 来自启用 OpenMP 时禁用自动堆数组的事实.这意味着如果没有 -fopenmp,大数组会自动放置在静态存储中(称为 .bss 段),而小数组则分配在堆栈中.当您打开 OpenMP 支持时,不会使用自动静态分配,并且您的 dense 数组将在例程的堆栈上分配.OS X 上的默认堆栈限制非常严格,因此会出现分段错误.

This problem, as well as the one in your other question, comes from the fact that automatic heap arrays are disabled when OpenMP is enabled. This means that without -fopenmp, big arrays are automatically placed in the static storage (known as the .bss segment) while small arrays are allocated on the stack. When you switch OpenMP support on, no automatic static allocation is used and your dense arrays gets allocated on the stack of the routine. The default stack limits on OS X are very restrictive, hence the segmentation fault.

您有多种选择.第一个选项是通过为 dense 赋予 SAVE 属性来使 dense 具有静态分配.另一种选择是通过将其设置为 ALLOCATABLE 然后使用 ALLOCATE 语句在堆上显式分配它,例如:

You have several options here. The first option is to make dense have static allocation by giving it the SAVE attribute. The other option is to explicitly allocate it on the heap by making it ALLOCATABLE and then using the ALLOCATE statement, e.g.:

REAL, DIMENSION(:,:,:), ALLOCATABLE :: dense

ALLOCATE(dense(256,256,256))

! Computations, computations, computations

DEALLOCATE(dense)

较新的 Fortran 版本支持在数组超出范围时自动取消分配没有 SAVE 属性的数组.

Newer Fortran versions support automatic deallocation of arrays without the SAVE attribute when they go out of scope.

请注意,您的 OpenMP 指令很好,不需要额外的数据共享子句.您不需要在 PRIVATE 子句中声明 i,因为循环计数器具有预先确定的私有数据共享类.您不需要将其他变量放在 SHARED 子句中,因为它们是隐式共享的.然而,您在 dense 上执行的操作应该与 ATOMIC UPDATE(或在旧的 OpenMP 实现上简单地使用 ATOMIC)同步,或者您应该使用 <代码>减少(+:密集).原子更新被转换为锁定的添加,与循环内的条件造成的巨大减速相比,不应导致太大的减速:

Note that your OpenMP directive is just fine and no additional data sharing clauses are necessary. You do not need to declare i in a PRIVATE clause since loop counters have predetermined private data-sharing class. You do not need to put the other variables in SHARED clause as they are implicitly shared. Yet the operations that you do on dense should either be synchronised with ATOMIC UPDATE (or simply ATOMIC on older OpenMP implementations) or you should use REDUCTION(+:dense). Atomic updates are translated to locked additions and should not incur much of a slowdown, compared to the huge slowdown from having conditionals inside the loop:

INTEGER :: xi, yi, zi

!$OMP PARALLEL DO PRIVATE(xi,yi,zi)
...
if (x1(i).gt.0.and.y1(i).gt.0.and.z1(i).gt.0) then
  xi = int(x1(i))
  yi = int(y1(i))
  zi = int(z1(i))
  !$OMP ATOMIC UPDATE
  dense(xi,yi,zi) = dense(xi,yi,zi) &
                  + dx1(i) * dy1(i) * dz1(i) * mpart
end if
...

针对其他情况复制代码并进行适当的更改.如果您的编译器抱怨 ATOMIC 结构中的 UPDATE 子句,只需将其删除即可.

Replicate the code with the proper changes for the other cases. If your compiler complains about the UPDATE clause in the ATOMIC construct, simply delete it.

REDUCTION(+:dense) 会在每个线程中创建一个 dense 的副本,这会消耗大量内存并且最后应用的减少会变慢并且随着 dense 的大小而变慢.对于小数组,它比原子更新更有效.

REDUCTION(+:dense) would create one copy of dense in each thread, which would consume a lot of memory and the reduction applied in the end would grow slower and slower with the size of dense. For small arrays it would work better than atomic updates.

这篇关于gfortran openmp 分段错误发生在基本 do 循环上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆