使用GDB远程调试MPI [英] Debugging MPI Remotely Using GDB

查看:196
本文介绍了使用GDB远程调试MPI的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试调试pi的远程访问组中使用MPI编写的代码.我无法直接访问Pis,以便能够使用GUI调试代码.

我已经尝试过部分,但是当我键入

gdb --pid

在代码未运行的情况下,什么也没有发生.该部分中的方法2也不起作用,因为使用Putty访问PI时无法打开多个窗口.

理想情况下,我希望能够在所有节点上运行时对其进行调试,并且当前要运行我的程序,我必须使用:

$ mpiexec -N 4 --host 10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6 -oversubscribe batSRTest shortpass.bat

这也引起了混乱,因为我什至不确定我是否正确添加了额外的参数.

我确实尝试使用gdb类似的工具调试共享的答案

调试MPI应用程序的问题在于它们以多个进程的形式运行,并且通常您无法直接访问这些进程.因此,存在一些特殊的并行调试器,它们能够将自身集成到MPI作业中.最受欢迎的两种是TotalView和Arm DDT(以前称为Allinea DDT).两者都是昂贵的商业产品,但是许多学术机构都购买了许可证,因此请检查您的情况是否如此.穷人的解决方案是使用GDB,它本身不是并行调试器,因此必须发挥创造力.

简而言之,该想法是在GDB的监督下启动MPI流程.但首先,让我们看一下Open MPI如何在多个节点上执行作业.下图应对此进行说明:

mpiexec <--+--> orted on node1 <--+--> rank 0
           |                      |
           |                      +--- rank 1
           |                      :
           |                      +--- rank N-1
           |
           +--- orted on node2 <--+--- rank N
           |                      |
           |                      +--- rank N+1
           |                      :
           :                      +--- rank 2N-1

mpiexec是MPI程序启动器,它负责读取诸如MPI等级数,主机列表,绑定策略等信息,并使用该信息来启动作业.对于与执行mpiexec的主机在同一主机上的进程,它只是简单地多次生成可执行文件.对于远程节点上的进程,它使用RSH,SSH或其他某种机制(对于SLURM,TM2等,为srun)在每个远程主机上启动orted帮助程序,然后在其特定程序上产生尽可能多的等级.托管.

与常规Unix程序不同,您永远不会通过控制台或Unix信号直接与MPI进程进行交互.取而代之的是,MPI运行时提供用于I/O转发和信号传播的机制.您与mpiexec的标准输入和输出进行交互,然后使用某些基础结构将输入发送到等级0,并显示从所有等级接收到的输出.类似地,发送到mpiexec的信号被转换并传播到MPI等级. MPI标准中没有完全指定I/O重定向或信号传播,因为它们是特定于平台的,但是一般的集群实现共识是所有等级的标准输出都转发到mpiexec的标准输出,而只有等级0从标准输入中接收;其余等级的标准输入连接到/dev/null.上图中的有向箭头显示了这一点.实际上,Open MPI允许您通过将--stdin rank传递到mpiexec来选择将接受标准输入的排名.

如果执行gdb mpiexec ...,则不是在调试MPI应用程序.相反,您将调试未运行代码的MPI启动器本身.您需要在MPI运行时和MPI排名之间插入GDB,即上图应转换为:

mpiexec <--+--> orted on node1 <--+--> gdb <---> rank 0
           |                      |
           |                      +--- gdb <---> rank 1
           |                      :
           |                      +--- gdb <---> rank N-1
           |
           +--- orted on node2 <--+--- gdb <---> rank N
           |                      |
           |                      +--- gdb <---> rank N+1
           |                      :
           :                      +--- gdb <---> rank 2N-1

现在的问题是如何与众多GDB实例进行交互,主要是因为您只能直接与其中的一个进行对话.使用TotalView和DDT,有一个GUI使用网络套接字与调试器组件进行通讯,因此可以解决此问题.对于许多GDB,您有两种选择(或更确切地说,就是骇客).

第一个选择是仅调试单个行为不佳的MPI等级.如果错误总是在同一级别发生,则可以让该错误在GDB的控制下运行,而其余错误单独运行,然后使用--stdin rank告诉mpiexec让您与调试器进行交互(如果该级别不为0.您需要一个简单的包装器脚本(称为debug_rank.sh):

#!/bin/sh
# Usage: debug_rank.sh <rank to debug> <executable> <arguments>

DEBUG_RANK=$1
shift
if [ $OMPI_COMM_WORLD_RANK == $DEBUG_RANK ]; then
   exec gdb -ex=run --args $*
else
   exec $*
fi

-ex=run告诉GDB在加载可执行文件后自动执行run命令.如果需要先设置断点,则可以忽略它.像这样使用包装器,例如调试等级3:

$ mpiexec ... --stdin 3 ./debug_rank.sh 3 batSRTest shortpass.bat

一旦等级3做得不好或达到断点,您将进入GDB命令提示符.您也可以不使用包装程序脚本而直接运行gdb,希望它不会在调试级别以外的其他命令级别上落入命令提示符.如果发生这种情况,GDB将退出,因为其标准输入将连接到/dev/null,从而降低了整个MPI作业,因为mpiexec将注意到一个级别退出而没有调用MPI_Finalize().

如果您不知道哪个特定的行列行为不正确,或者每个行之间的行列是否不同,或者您想在多个中断点中设置多个断点,则需要解决输入重定向问题.而最简单"的解决方案是使用X11终端仿真器,例如xterm.这里的窍门是GUI程序从窗口系统而不是从标准输入获取输入,因此,尽管标准输入已连接到/dev/null,您仍可以愉快地键入并将输入发送到在xterm内部运行的命令.同样,X11是可以通过TCP/IP运行的客户端/服务器协议,允许您远程运行xterm并在运行某些X11实现(例如X.org或XWayland)时将其显示在本地系统上.这正是打开MPI"页面上显示的命令的作用:

$ mpiexec ... xterm -e gdb -ex=run --args batSRTest shortpass.bat

这将启动xterm的多个副本,每个副本执行gdb -ex=run --args batSRTest shortpass.bat.因此,您可以在自己的终端窗口中获​​得许多GDB实例,这使您可以与所有它们进行交互.为此,您需要做一些事情:

  • 每个Pi上应安装xterm的副本
  • 您的网络应该是低延迟的网络,因为X11协议在具有较长延迟的网络上运行速度非常慢
  • 您的X11服务器应该可以从所有Pi到达,并且应该配置为接受来自它们的连接
  • DISPLAY环境变量应相应设置

任何xterm之类的X11客户端应用程序都使用DISPLAY环境变量中的值来确定如何连接到X11服务器.其值的一般格式为<optional hostname>:<display>[.<screen>].对于管理单个显示器的本地服务器,DISPLAY通常是:0.0甚至只是:0.如果缺少<optional hostname>,则意味着特殊值host/unix,这意味着X11服务器正在侦听位于/tmp/.X11-unix/中的Unix域套接字.默认情况下,出于安全原因,X11服务器仅在Unix域套接字上侦听,这使得它们对于网络客户端而言是不可访问的.您需要启用对TCP/IP套接字的侦听并覆盖绑定地址(默认情况下为127.0.0.1),并确保从Pis可以访问您的主机,即,它们可以直接连接到IP地址. X11服务器侦听的TCP端口.如果您采用这种方式,则它的工作方式如下:

  1. 为X11启用TCP连接,并使其在网络接口上侦听
  2. 检查系统上DISPLAY的值
  3. 添加您的IP地址
  4. 像这样运行MPI作业:

$ mpiexec ... -x DISPLAY=your.ip:d.s xterm -e gdb -ex=run --args batSRTest shortpass.bat

其中d.s是本地DISPLAY变量设置为的显示和屏幕值.确保您的防火墙允许端口6000+d上的入站TCP连接.

并非总是建议甚至不可能从网络启用TCP连接,尤其是在使用NAT的情况下.因此,另一种解决方案是使用通过SSH的X11转发.为此,在连接到SSH服务器时,您需要将-X-Y传递给SSH客户端:

 $ ssh -X username@server

-Y而不是-X启用了一些不受信任的扩展,并且对于某些X11应用程序可能是必需的. X11转发仅在服务器端启用时才有效.还需要在服务器上安装xauth.但是仅在服务器上启用X11转发是不够的,因为默认情况下,SSH服务器将在回送接口上侦听要转发的X11连接.对于OpenSSH,必须相应地设置以下两个配置参数:

X11Forwarding yes    # Enable X11 forwarding
X11UseLocalhost no   # Listen on all network interfaces

如果正确配置了SSH服务器且存在xauth命令,则在SSH进入系统时,DISPLAY的值应类似于hostname:10.0,并且运行netstat -an | grep 6010会产生以下内容:

tcp        0      0 0.0.0.0:6010            0.0.0.0:*               LISTEN
tcp6       0      0 :::6010                 :::*                    LISTEN

表示X11转发套接字已绑定到所有网络接口.然后,您应该像这样启动MPI作业:

$ mpiexec -x DISPLAY=server.ip:10.0 xterm -e gdb -ex=run --args batSRTest shortpass.bat

其中,server.ip是服务器在将其连接到Pi的网络中拥有的IP(我怀疑您的情况是10.0.0.1).另外,应在服务器的防火墙中启用以6010开头的一系列TCP端口.实际值取决于有多少个X11转发会话.默认情况下,X11DisplayOffset设置为10,因此SSH服务器将以显示10开始并向上运行,直到找到未分配的显示编号.另外,如果您在Pis上的主目录未与服务器共享(例如,通过NFS挂载),则还需要将在服务器的主目录中找到的.Xauthority文件复制到所有服务器上的主目录中小猪该文件包含通过X11转发器进行身份验证所需的MIT Magic cookie,并且每次在启用X11转发的情况下将SSH SSH到服务器时都会重新生成,因此请确保在每次SSH登录后再次将其复制到所有Pi.

现在,如果所有这一切似乎都过于复杂,则GDB还具有远程调试功能.您可以在服务器上启动GDB,然后在GDB服务器程序gdbserver的监视下运行远程MPI进程,然后在本地GDB中使用远程调试命令连接到其中一台GDB服务器.这很麻烦.您需要告诉每个GDB服务器在不同的端口上进行侦听.包装脚本(debug_server.sh)可能会有所帮助:

#!/bin/sh
# Usage: debug_server.sh <executable> <arguments>

GDB_HOST=$(hostname)
GDB_PORT=$(( 60000 + $OMPI_COMM_WORLD_RANK ))
echo "GDB server for rank $OMPI_COMM_WORLD_RANK available on $GDB_HOST:$GDB_PORT"
exec gdbserver :$GDB_PORT $*

像这样运行:

$ mpiexec ... ./debug_server.sh batSRTest shortpass.bat

它将打印不同GDB服务器实例正在侦听的主机名和端口的列表.在不带参数的情况下触发GDB并发出以下命令:

(gdb) target remote hostname:port

其中,hostnameport是目标Pi的IP(或主机名,如果可以解析). GDB服务器自动在可执行文件的入口处中断,该入口很有可能在动态链接器中的某个位置,因此您需要发出continue命令以使其运行.您需要为每个GDB服务器实例执行此操作,而且我不知道一种在不停止当前目标的情况下与其断开连接的方法,因此您可能还需要启动GDB的加载.

GDB可能有一些GUI可以简化此操作.您可以查看 Eclipse PTP 项目,该项目提供了并行调试器,并查看它是否适用于你.您可能会发现这些幻灯片很有用.我个人从未使用过PTP,也不知道它能做什么.


基本上这就是为什么除了最复杂的情​​况外,大多数MPI调试都是使用printf()完成的原因.将--tag-output添加到mpiexec参数列表中,以使其在每行输出之前加上其来自的作业ID和等级ID,因此您不必自己打印该信息.

I am trying to debug code I wrote using MPI from a remote access group of pi's. I can not access the Pis directly in order to be able to use a GUI to debug the code.

I have tried what using screen like is shown in this question but anytime I try to use screen I get this message:

There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
  screen

Either request fewer slots for your application, or make more slots available
for use.

If I try and tell it to just use 1 screen, mpiexec fails

mpiexec -N 16 --host 10.0.0.3 -np 1 screen -oversubscribe batSRTest3 shortpass.bat
--------------------------------------------------------------------------
mpiexec was unable to find the specified executable file, and therefore
did not launch the job.  This error was first reported for process
rank 0; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpiexec command
      line parameter option (remember that mpiexec interprets the first
      unrecognized command line token as the executable).

Node:       node1
Executable: screen

I have looked at the openMPI FAQ but the information does not apply for remote access. I tried following this part but when I type in

gdb --pid

with the code running nothing happens. Method 2 in that section also will not work as I cannot open multiple windows when accessing the PIs using Putty.

I want to be able to debug it when running on all of the nodes ideally, and currently to run my program I have to use:

$ mpiexec -N 4 --host 10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6 -oversubscribe batSRTest shortpass.bat

Which is also causing confusion as I'm not even sure I am adding in the extra arguments correctly.

I did try debugging using gdb similiar to the answer shared here but that just resulted in MPI failing since it wasn't given multiple tasks.

(gdb) exec-file batSRTest3
(gdb) run
Starting program: /home/pi/progs/batSRTest3 mpiexec -N 16 --host 10.0.0.3 -oversubscribe batSRTest3 shortpass.bat
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[Detaching after fork from child process 17157]
[New Thread 0x7691a460 (LWP 17162)]
[New Thread 0x75d3d460 (LWP 17163)]
[node1:17153] *** An error occurred in MPI_Group_incl
[node1:17153] *** reported by process [141361153,0]
[node1:17153] *** on communicator MPI_COMM_WORLD
[node1:17153] *** MPI_ERR_RANK: invalid rank
[node1:17153] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node1:17153] ***    and potentially your MPI job)
[Thread 0x7691a460 (LWP 17162) exited]
[Thread 0x76ff5010 (LWP 17153) exited]
[Inferior 1 (process 17153) exited with code 06]
(gdb) q

解决方案

The problem with debugging MPI applications is that they run in the form of multiple process and often you do not have direct access to those processes. Therefore, special parallel debuggers exist that are able to integrate themselves inside the MPI job. The two most popular ones are TotalView and Arm DDT (formerly known as Allinea DDT). Both are expensive commercial products, but many academic institutions buy licenses, so check if that is the case with yours. The poor man's solution is to use GDB, which is not a parallel debugger per se, so one has to get creative.

In a nutshell, the idea is to launch your MPI processes under GDB's supervision. But first, let's look at how Open MPI executes a job on multiple nodes. The following diagram should illustrate it:

mpiexec <--+--> orted on node1 <--+--> rank 0
           |                      |
           |                      +--- rank 1
           |                      :
           |                      +--- rank N-1
           |
           +--- orted on node2 <--+--- rank N
           |                      |
           |                      +--- rank N+1
           |                      :
           :                      +--- rank 2N-1

mpiexec is the MPI program launcher, which is responsible for reading in information such as the number of MPI ranks, host lists, binding policy, etc., and using that information to launch the job. For processes on the same host as the one where mpiexec was executed, it simply spawns the executable a number of times. For processes on remote nodes, it uses RSH, SSH, or some other mechanism (srun for SLURM, TM2, etc.) to start on each remote host the orted helper program, which then spawns as many ranks on its particular host as necessary.

Unlike regular Unix programs, you never interact directly with the MPI processes through the console or via Unix signals. Instead, the MPI runtime provides mechanisms for I/O forwarding and signal propagation. You interract with the standard input and output of mpiexec, which then uses some infrastructure to send your input to rank 0 and to show you the output received from all ranks. Similarly, signals sent to mpiexec are translated and propagated to the MPI ranks. Neither I/O redirection nor signal propagation are fully specified in the MPI standard, because they are very platform-specific, but the general cluster implementation consensus is that the standard output of all ranks gets forwarded to the standard output of mpiexec while only rank 0 receives from the standard input; the rest of the ranks have their standard input connected to /dev/null. This is shown with directed arrows on the diagram above. Actually, Open MPI allows you to select which rank will receive the standard input by passing --stdin rank to mpiexec.

If you do gdb mpiexec ..., you are not debugging the MPI application. Instead, you will be debugging the MPI launcher itself, which isn't running your code. You need to interpose GDB between the MPI runtime and the MPI ranks themselves, i.e., the above diagram should transform into:

mpiexec <--+--> orted on node1 <--+--> gdb <---> rank 0
           |                      |
           |                      +--- gdb <---> rank 1
           |                      :
           |                      +--- gdb <---> rank N-1
           |
           +--- orted on node2 <--+--- gdb <---> rank N
           |                      |
           |                      +--- gdb <---> rank N+1
           |                      :
           :                      +--- gdb <---> rank 2N-1

The problem now becomes how to interact with that multitude of GDB instances, mostly because you can directly talk to only one of them. With TotalView and DDT, there is a GUI that talks to the debugger components using network sockets, so this problem is solved. With many GDBs, you have a couple of options (or rather, hacks).

First option is to only debug a single misbehaving MPI rank. If the error always occurs in one and the same rank, you can have it run under the control of GDB while the rest run on their own and then use --stdin rank to tell mpiexec to let you interact with the debugger if the rank is not 0. You need a simple wrapper script (called debug_rank.sh):

#!/bin/sh
# Usage: debug_rank.sh <rank to debug> <executable> <arguments>

DEBUG_RANK=$1
shift
if [ $OMPI_COMM_WORLD_RANK == $DEBUG_RANK ]; then
   exec gdb -ex=run --args $*
else
   exec $*
fi

The -ex=run tells GDB to automatically execute the run command after loading the executable. You may omit it if you need to set breakpoints first. Use the wrapper like this, for example to debug rank 3:

$ mpiexec ... --stdin 3 ./debug_rank.sh 3 batSRTest shortpass.bat

Once rank 3 does something bad or reaches a breakpoint, you'll be dropped into the GDB command prompt. You can also go without the wrapper script and run gdb directly, hoping that it won't drop into its command prompt on any other rank than the one you expect to be debugging. If that happens, GDB will exit because its standard input will be connected to /dev/null, bringing down the whole MPI job because mpiexec will notice one rank exiting without calling MPI_Finalize().

If you don't know which particular rank misbehaves, or if it varies from run to run, or if you want to set breakpoints in more than one of them, then you need to get around the input redirection problem. And the "simplest" solution is to use X11 terminal emulators such as xterm. The trick here is that GUI programs get their input from the windowing system and not from the standard input, so you can happily type in and send input to commands running inside xterm despite its standard input being connected to /dev/null. Also, X11 is a client/server protocol that can run over TCP/IP, allowing you to run xterm remotely and have it displayed on your local system when running some X11 implementation such as X.org or XWayland. That's exactly what the command shown on the Open MPI page does:

$ mpiexec ... xterm -e gdb -ex=run --args batSRTest shortpass.bat

This starts many copies of xterm and each copy executes gdb -ex=run --args batSRTest shortpass.bat. So you get many instances of GDB in their own terminal windows, which allows you to interact with any and all of them. For this to work, you need a couple of things:

  • there should be a copy of xterm installed on each Pi
  • your network should be a low-latency one because the X11 protocol runs terribly slow on networks with longer delays
  • your X11 server should be reachable from all of the Pis and should be configured to accept connections from them
  • the DISPLAY environment variable should be set accordingly

Any X11 client application such as xterm uses the value in the DISPLAY environment variable to determine how to connect to the X11 server. Its value has the general form <optional hostname>:<display>[.<screen>]. For local servers managing a single display, DISPLAY is usually :0.0 or even just :0. When <optional hostname> is missing, the special value host/unix is implied, which means that the X11 server is listening on a Unix domain socket located in /tmp/.X11-unix/. By default, for security reasons X11 servers only listen on Unix domain sockets, which makes them unreachable for network clients. You need to enable listening on a TCP/IP socket and override the bind address, which is 127.0.0.1 by default, and to make sure your host is reachable from the Pis, i.e., that they can directly connect to your IP address on the TCP port that the X11 server listens on. If you go this way, then it works like this:

  1. Enable TCP connections for X11 and make it listen on a networked interface
  2. Examine the value of DISPLAY on your system
  3. Prepend your IP address
  4. Run the MPI job like this:

$ mpiexec ... -x DISPLAY=your.ip:d.s xterm -e gdb -ex=run --args batSRTest shortpass.bat

where d.s are the display and screen values that your local DISPLAY variable is set to. Make sure your firewall allows inbound TCP connections on port 6000+d.

Enabling TCP connections from the network is not always advisable or even possible, especially if you are behind NAT. Therefore, an alternative solution is to use X11 forwarding over SSH. For that, you need to pass -X or -Y to the SSH client when connecting to the SSH server:

 $ ssh -X username@server

-Y instead of -X enables some untrusted extensions and may be required for some X11 applications. X11 forwarding only works if enabled on the server side. It also needs that xauth is installed on the server. But simply enabling X11 forwarding on the server is not enough since by default the SSH server will listen on the loopback interface for X11 connections to forward. For OpenSSH, the following two configuration parameters must be set accordingly:

X11Forwarding yes    # Enable X11 forwarding
X11UseLocalhost no   # Listen on all network interfaces

If the SSH server is configured correctly and the xauth command is present, when you SSH into the system the value of DISPLAY should be something like hostname:10.0 and running netstat -an | grep 6010 should produce something like this:

tcp        0      0 0.0.0.0:6010            0.0.0.0:*               LISTEN
tcp6       0      0 :::6010                 :::*                    LISTEN

indicating that the X11 forwarding sockets are bound to all network interfaces. You should then launch the MPI job like this:

$ mpiexec -x DISPLAY=server.ip:10.0 xterm -e gdb -ex=run --args batSRTest shortpass.bat

where server.ip is the IP the server has in the network that connects it to the Pis (I suspect that would be 10.0.0.1 in your case). Also, a range of TCP ports starting with 6010 should be enabled in the server's firewall. The actual value depends on how many X11 forwarding sessions there are. By default, X11DisplayOffset is set to 10, so the SSH server will start with display 10 and go up until an unallocated display number is found. Also, if your home directory on the Pis is not somehow shared with that on the server (e.g., via NFS mounts), you also need to copy the .Xauthority file found in your home directory on the server to your home directory on all Pis. This file contains the MIT magic cookie needed to authenticate with the X11 forwarder and is regenerated each time you SSH into the server with X11 forwarding enabled, so make sure to copy it again to all Pis after each SSH login.

Now, if all this is seems overly complex, GDB also has remote debugging abilities. You can start GDB on the server and then run remote MPI processes under the supervision of the GDB server program gdbserver, then use the remote debugging commands in the local GDB to connect to one of the GDB servers. This is quite cumbersome. You need to tell each GDB server to listen on a different port. A wrapper script (debug_server.sh) may help:

#!/bin/sh
# Usage: debug_server.sh <executable> <arguments>

GDB_HOST=$(hostname)
GDB_PORT=$(( 60000 + $OMPI_COMM_WORLD_RANK ))
echo "GDB server for rank $OMPI_COMM_WORLD_RANK available on $GDB_HOST:$GDB_PORT"
exec gdbserver :$GDB_PORT $*

Run like this:

$ mpiexec ... ./debug_server.sh batSRTest shortpass.bat

It will print the list of hostnames and ports that the different GDB server instances are listening on. Fire GDB with no arguments and issue the following command:

(gdb) target remote hostname:port

where hostname and port are the IP (or hostname, if resolvable) of the Pi of interest. GDB server automatically breaks on the entry point of the executable, which will most likely be somewhere in the dynamic linker, and you need to issue the continue command to make it run. You need to do that for each GDB server instance and I don't know a way to disconnect from the current target without stoping it, so you may need to start a load of GDBs too.

There may be some GUI to GDB that simplifies this. You may look into the Eclipse PTP project, which provides a parallel debugger, and see whether it works for you. You may find these slides useful. I personally have never used PTP and have no idea what it can do.


That's basically why most MPI debugging is done using printf() except for the most convoluted cases. Add --tag-output to the list of mpiexec arguments to make it prefix each output line with the job ID and rank ID it comes from so you don't have to print that information yourself.

这篇关于使用GDB远程调试MPI的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆