Fortran 中 MPI_FINALIZE() 期间的分段错误 [英] Segmentation fault during MPI_FINALIZE() in Fortran

查看:29
本文介绍了Fortran 中 MPI_FINALIZE() 期间的分段错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Fortran 90 程序中调用 MPI_FINALIZE() 时出现分段错误.虽然代码相当广泛,但我将发布伪代码并查看它是否引发任何标志.我有一种预感(但还没有尝试过)它可能可能是由于没有释放数组引起的?但是我不确定 - 在调用 MPI_FINALIZE 期间未能在 Fortran 90 中解除分配数组会导致分段错误吗?

I am getting a segmentation fault during a call to MPI_FINALIZE() in a Fortran 90 program. While the code is quite extensive, I will post the pseudocode and see if it raises any flags. I have a hunch (but have not yet tried this) that it could possibly be caused by not deallocating arrays? I'm not sure however - can failure to deallocate arrays in Fortran 90 cause segmentation faults during a call to MPI_FINALIZE?

if(<rank 0>) then
  do iat = 1,natoms
    do il = 0, LMAX
      do im = -il,il
        <mpi_recv "rank_rdy"> ! find out which rank is ready for (at,l,m)
        <mpi_send "(iat,il,im)"> ! send (at,l,m) to the rank asking for it
      enddo
    enddo
  enddo
else ! other ranks send a 'ready' signal and recieve the (at,l,m) to optimize
  if(<rank 0 is not finished processing (at,l,m)'s>)
    <mpi_send "my_rank"> ! tell rank 0 that i am ready to receive
    <mpi_recv "(iat,il,im)"> ! recieve (at,l,m) from rank 0
    call optimize(iat,il,im) ! do work on (at,l,m)
  endif
endif

if(<rank 0>)
  <read temp files created by other ranks>
  <write temp files to one master file>
endif

print*, 'calling finalize'

call MPI_BARRIER(MPI_COMM_WORLD, ierr)
call MPI_FINALIZE()

现在在输出中,除了与此问题无关的其他信息外,我还得到以下信息:

Now on output I get, among other information not pertaining to this problem, the following:

 calling finalize
 calling finalize
 calling finalize
 calling finalize
 calling finalize
 calling finalize

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)

即使我不调用 MPI_BARRIER 我也会遇到同样的问题,但我认为这可能会有所帮助.请注意,每个等级中都使用了一些数组,我不会费心解除分配,因为我在整个程序中都使用它们,所以我不担心内存泄漏或任何事情.是否有可能由于 MPI_FINALIZE() 被调用而没有释放内存而发生此段错误?

I get the same problem even if I do not call MPI_BARRIER but I thought that might help. Note that there are arrays used in every rank that I do not bother deallocating because I use them through the entire program so I am not worried about memory leaks or anything. Is it possible that this segfault is occurring due to MPI_FINALIZE() being called without freeing up memory?

我将自己探索更多,但我想发布这个问题有几个原因:

I am going to explore this more on my own, but I wanted to post this question for a few reasons:

  1. 想知道这是否是调用 MPI_FINALIZE()

想知道为什么在调用 MPI_FINALIZE() 时会发生这种情况(如果确实是问题所在).在内部,导致此段错误的原因是什么?

Want to know why this happens (if it is actually the problem) when calling MPI_FINALIZE(). Internally, what is going on that causes this segfault?

我在网上搜索了高低,没有发现任何关于这个问题的信息,所以对于后人来说,这可能是一个在网络上回答的好问题.

I have searched high and low online and found nothing about this problem, so for posterity this could be a good question to have answered on the web.

编辑:我忘了提到这一点,但是在串行运行时我无法复制这个问题.显然,我不做 (at,l,m) 串行分布.唯一的过程只是简单地遍历所有组合并逐一优化它们.我没有,但是我认为可能会导致 MPI 出现问题的数组被解除分配,但我仍然没有遇到段错误.

Edit: I forgot to mention this, but I am not able to duplicate this problem when running it in serial. Obviously, I do not do the distribution of (at,l,m) in serial. The only process simply runs through all combinations and optimizes them one by one. I do not, however deallocate the arrays which I think might be causing the problem in MPI, and I still do not get a segfault.

推荐答案

如果可用,应始终使用 Fortran 90 MPI 接口,而不是旧的 FORTRAN 77 接口.那就是你应该永远

One should always use the Fortran 90 MPI interface if available instead of the old FORTRAN 77 inteface. That is you should always

USE mpi

而不是

INCLUDE 'mpif.h'

两者之间的区别在于 Fortran 90 接口将所有 MPI 子例程放在一个模块中,因此会生成显式接口.这允许编译器在调用中进行参数检查,并在例如你的情况下发出错误信号.省略一个参数.

The difference between the two is that the Fortran 90 interface puts all MPI subroutines in a module and thus explicit interfaces are being generated. This allows the compiler to do argument checking in calls and signal an error if you e.g. omit an argument.

在 Fortran 的调用约定中,所有参数都按地址传递,而与它们的类型无关.这允许编译器生成对函数和子例程的正确调用,而无需像在 C 中那样需要原型.但这也意味着可以自由传递 INTEGER 参数,其中 REAL 是预期的,几乎所有 FORTRAN 77 编译器都会愉快地编译这样的代码,或者可以传递比预期更少/更多的参数.有一些外部工具,通常以 C 工具 lint 的名称称为 linters,可以解析整个源代码树,并可以查明此类错误以及编译器无法查明的许多其他错误小心找到.flint 是为 Fortran 进行此类静态代码分析的工具之一.Fortran 90 添加了接口以弥补 Fortran 容易出错的特性.

In Fortran's calling convention all arguments are passed by address, irrespective of their type. This allows the compiler to generate proper calls to functions and subroutines without requiring prototypes as in C. But this also means that one can freely pass the an INTEGER argument where an array of REAL is expected and virtually all FORTRAN 77 compilers will happily compile such code or one can pass fewer/more arguments than expected. There are external tools, usually called linters by the name of the C tool lint, that parse the whole source tree and can pinpoint such errors and many others that the compiler would not care to find. One such tool that does such static code analysis for Fortran is flint. Fortran 90 added interfaces in order to compensate for this error-prone nature of Fortran.

使用比预期更少的参数调用 Fortran 子例程可能会产生许多不同的不良影响,具体取决于体系结构,但在大多数情况下会导致崩溃,尤其是当省略的参数是输出参数时.被调用的函数不知道传递的参数更少——它只是查看它的地址应该在哪里,并获取它在那里找到的任何地址.由于 ierr 是一个输出参数,因此会发生对该地址的写入.该地址很有可能不会指向与映射内存相对应的虚拟地址,并且操作系统将提供严重的分段错误.即使地址指向用户分配的内存中的某个位置,结果也可能是某些控制结构中的重要值被覆盖.如果即使没有发生这种情况,也会有调用约定,其中被调用者清理堆栈帧 - 在这种情况下,堆栈指针将不正确地递增,并且返回地址将与正确的完全不同,这几乎肯定会导致跳转到不可执行(甚至非映射)的内存,并再次导致分段错误.

Calling a Fortran subroutine with fewer arguments than expected can have many different ill effects depending on the architecture but in most cases will result in crash, especially if the omitted argument is an output one. The called function doesn't know that less arguments are being passed - it just looks where its address should be and takes whatever address it finds there. As ierr is an output argument, a write at that address would occur. There is a good chance that the address would not point to a virtual address that corresponds to mapped memory and a hefty segmentation fault would be delivered by the OS. Even if the address points somewhere in user's allocated memory, the result could be an overwrite of an important value in some control structure. And if even that doesn't happen, then there are calling conventions in which the callee cleans up the stack frame - in this case the stack pointer would be incorrectly incremented and the return address would be completely different from the right one, which would almost certainly lead to jump to non-executable (and even non-mapped) memory and again to segmentation fault.

这篇关于Fortran 中 MPI_FINALIZE() 期间的分段错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆