使用GDB在Fortran中调试MPI程序 [英] Using GDB to debug an MPI program in Fortran

查看：298 发布时间：2018/3/16 17:31:26 c++ debugging fortran gdb mpi

本文介绍了使用GDB在Fortran中调试MPI程序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我阅读了此并抵达 here ，所以现在我想我应该（如果不是这样，请告诉我）重写代码

  {
 int i = 0; 
 char hostname [256]; 
 gethostname（主机名，sizeof（主机名））; 
 printf（PID％d在％s准备好attach \ n，getpid（），hostname）; 
 fflush（stdout）; 
 while（0 == i）
 sleep（5）; 
}

在Fortran中。从这个答案我明白在Fortran中我可以简单地使用 MPI_Get_processor_name 代替 gethostname 。一切都很简单，但 flush 。怎么样？

我该把它放在哪里？在主程序之后 MPI_Init ？
然后呢？我应该怎么做？

关于编译选项，我提到 this 并使用 -v -da -Q 作为 mpifort 包装。

 
 
  此解决方案不适合我的情况，因为我需要在27个进程上运行该程序，所以我只想检查一个进程。

解决方案  最简单的方法： 
 
 
 实际上我经常做的就是在本地运行MPI作业， 。没有任何上述代码。然后如果它挂起，我使用 top 来找出进程中的 PID ，并且通常可以轻松地猜测出哪个排名是它们来自PID（它们往往是连续的，最低的一个是0）。 0级以下是进程1641，并且它们比1级pid 1642等等...  
 
 
  PID用户PR NI VIRT RES SHR S％CPU％MEM TIME + COMMAND 
 1642 me 20 0 167328 7716 5816 R 100.0 0.047 0：25.02 a.out 
 1644 me 20 0 167328 7656 5756 R 100.0 0.047 0：25.04 a.out 
 1645 me 20 0 167328 7700 5792 R 100.0 0.047 0：24.97 a.out 
 1646 me 20 0 167328 7736 5836 R 100.0 0.047 0：25.00 a.out 
 1641 me 20 0 167328 7572 5668 R 99.67 0.046 0：24.95 a.out 
  
然后我只是执行 gdb -pid ，然后检查进程中的堆栈和局部变量。 （在GDB控制台中使用 help stack ）

最重要的是获得回溯， code> bt 。

在检查死锁时，这会很好用。当你不得不在某个特定的地方停下来的时候就不太好。然后你必须提前附加调试器。

您的代码：

我不认为在Fortran中flush是必需的。至少在我使用的编译器中，我认为Fortran 写入和 print flush。

但您绝对可以使用刷新语句
使用iso_fortran_env flush（output_unit）
您打印主机名和 pid ，您的写入。但正如我所说，我只是从单独打印开始。

你所做的是你登录到该节点并将gdb附加到正确的过程中， p>

gdb -pid 12345
对于睡眠，您可以使用许多编译器中可用的非标准 sleep 内部子例程或编写您自己的程序。

是否在 MPI_Init 之前或之后？如果你想打印排名，它必须在之后。另外对于使用 MPI_Get_processor_name ，它必须在之后。通常建议您尽早在程序中调用 MPI_Init 。

代码就像
使用mpi 隐式无字符（MPI_MAX_PROCESSOR_NAME）::主机名 integer :: rank，即pid，hostname_len 整数，volatile :: i 调用MPI_Init（ie）调用MPI_Get_processor_name（hostname，hostname_len，ie）！非标准扩展名 pid = getpid（） call MPI_Comm_rank（MPI_COMM_WORLD，rank，即）编写（*，*）PID，pid，on，trim（主机名），准备挂载是世界排名，排名！this用于在特定位置阻止执行，直到您通过设置i = 0 i = 1 do ！非标准扩展调用sleep（1） if（i == 0）exit end do end
重要说明：如果您是compi通过优化比编译器可以看到 i == 0 永远不会是真的，并且会彻底删除检查。您必须降低优化或将 i 声明为 volatile 。易失性意味着该值可随时更改，编译器必须从内存中重新加载其值以进行检查。这需要Fortran 2003.

附加正确的过程：

上面的代码将打印，例如，

> mpif90 -ggdb mpi_gdb.f90 > mpirun -n 4 ./a.out linux.site上的PID 2356准备好挂载是世界排名1 linux.site上的PID 2357准备挂载是世界排名2 在linux.site准备好挂接的PID 2358是世界排名3 在linux.site准备挂载的PID 2355是世界排名0
它们看起来像是：

pre $ PID USER PR NI VIRT RES SHR S％CPU％ MEM TIME + COMMAND 2355 me 20 0 167328 7452 5564 R 100.0 0.045 1：42.55 a.out 2356 me 20 0 167328 7428 5548 R 100.0 0.045 1：42.54 a.out 2357 me 20 0 167328 7384 5500 R 100.0 0.045 1：42.54 a.out 2358 me 20 0 167328 7388 5512 R 100.0 0.045 1：42.51 a.out
，您只需选择您想要执行的级别并执行即可

gdb -pid 2355
附加等级0等等。当然，在另一个终端窗口中。
然后你得到像

MAIN__（）在mpi_gdb.f90：26 26 if（i == 0）exit （gdb）info locals hostname ='linux.site'， ''<重复246次> hostname_len = 10 i = 1 ie = 0 pid = 2457 rank = 0 （gdb）set var i = 0 （gdb）cont 继续。 [劣1（进程2355）正常退出]

I read this and arrived here, so now I think I should (if not so, please, tell me) rewrite the code
{ int i = 0; char hostname[256]; gethostname(hostname, sizeof(hostname)); printf("PID %d on %s ready for attach\n", getpid(), hostname); fflush(stdout); while (0 == i) sleep(5); }
in Fortran. From this answer I understood that in Fortran I could simply use MPI_Get_processor_name in place of gethostname. Everything else is simple but flush. What about it?

Where should I put it? In the main program after MPI_Init? And then? What should I do?

For what concerns the compile options, I referred to this and used -v -da -Q as options to the mpifort wrapper.

This solution doesn't fit my case, since I need to run the program on 27 processes as minimum, so I'd like to check one process only.
解决方案
Simplest approach:

What I actually often do is I just run the MPI job locally and see what it does. Without any of the above code. Then if it hangs I use top to find out the PIDof the processes and usually one can guess easily which rank is which from the PIDs (they tend to be consecutive and the lowest one is rank 0). Below rank 0 is process 1641 and than they are rank 1 pid 1642 and so on...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1642 me 20 0 167328 7716 5816 R 100.0 0.047 0:25.02 a.out 1644 me 20 0 167328 7656 5756 R 100.0 0.047 0:25.04 a.out 1645 me 20 0 167328 7700 5792 R 100.0 0.047 0:24.97 a.out 1646 me 20 0 167328 7736 5836 R 100.0 0.047 0:25.00 a.out 1641 me 20 0 167328 7572 5668 R 99.67 0.046 0:24.95 a.out
Then I just do gdb -pid and I examine the stack and local variables in the processes. (use help stack in the GDB console)

The most important is to get a backtrace, so just print bt in the console.

This will work well when examining deadlocks. Less well when you have to stop at some specific place. Then you have to attach the debugger early.

Your code:

I don't think the flush is necessary in Fortran. I think Fortran write and print flush as necessary at least in compilers I use.

But you definitely can use the flush statement
use iso_fortran_env flush(output_unit)
just put that flush after your write where you print hostname and pid. But as I said I would just start with printing alone.

What you than do is that you login to that node and attach gdb to the righ process with something like
gdb -pid 12345
For sleep you can use the non-standard sleep intrinsic subroutine available in many compilers or write your own.

Whether before or after MPI_Init? If you want to print the rank, it must be after. Also for using MPI_Get_processor_name it must be after. It is normally recommended to call MPI_Init as early as possible in your program.

The code is then something like
use mpi implicit none character(MPI_MAX_PROCESSOR_NAME) :: hostname integer :: rank, ie, pid, hostname_len integer, volatile :: i call MPI_Init(ie) call MPI_Get_processor_name(hostname, hostname_len, ie) !non-standard extension pid = getpid() call MPI_Comm_rank(MPI_COMM_WORLD, rank, ie) write(*,*) "PID ", pid, " on ", trim(hostname), " ready for attach is world rank ", rank !this serves to block the execution at a specific place until you unblock it in GDB by setting i=0 i = 1 do !non-standard extension call sleep(1) if (i==0) exit end do end
Important note: if you compile with optimizations than the compiler can see that i==0 is never true and will remove the check completely. You must lower your optimizations or declare i as volatile. Volatile means that the value can change at any time and the compiler must reload its value from memory for the check. That requires Fortran 2003.

Attaching the right process:

The above code will print, for example,
> mpif90 -ggdb mpi_gdb.f90 > mpirun -n 4 ./a.out PID 2356 on linux.site ready for attach is world rank 1 PID 2357 on linux.site ready for attach is world rank 2 PID 2358 on linux.site ready for attach is world rank 3 PID 2355 on linux.site ready for attach is world rank 0
In top they look like
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2355 me 20 0 167328 7452 5564 R 100.0 0.045 1:42.55 a.out 2356 me 20 0 167328 7428 5548 R 100.0 0.045 1:42.54 a.out 2357 me 20 0 167328 7384 5500 R 100.0 0.045 1:42.54 a.out 2358 me 20 0 167328 7388 5512 R 100.0 0.045 1:42.51 a.out
and you just select which rank you want and execute
gdb -pid 2355
to attach rank 0 and so on. In a different terminal window, of course.

Then you get something like
MAIN__ () at mpi_gdb.f90:26 26 if (i==0) exit (gdb) info locals hostname = 'linux.site', ' ' <repeats 246 times> hostname_len = 10 i = 1 ie = 0 pid = 2457 rank = 0 (gdb) set var i = 0 (gdb) cont Continuing. [Inferior 1 (process 2355) exited normally]

这篇关于使用GDB在Fortran中调试MPI程序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

您的代码：

附加正确的过程：

Simplest approach:

Your code:

Attaching the right process:

使用GDB在Fortran中调试MPI程序 [英] Using GDB to debug an MPI program in Fortran

问题描述

最简单的方法：

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

使用GDB在Fortran中调试MPI程序 [英] Using GDB to debug an MPI program in Fortran

问题描述

最简单的方法：

您的代码：

附加正确的过程：

Simplest approach:

Your code:

Attaching the right process:

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭