使用 GDB 远程调试 MPI [英] Debugging MPI Remotely Using GDB

查看:57
本文介绍了使用 GDB 远程调试 MPI的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试调试我使用来自 pi 远程访问组的 MPI 编写的代码.我无法直接访问 Pis,以便能够使用 GUI 调试代码.

I am trying to debug code I wrote using MPI from a remote access group of pi's. I can not access the Pis directly in order to be able to use a GUI to debug the code.

我已经尝试过使用屏幕显示的内容 问题 但任何时候我尝试使用屏幕我收到这条消息:

I have tried what using screen like is shown in this question but anytime I try to use screen I get this message:

There are not enough slots available in the system to satisfy the 2 slots
that were requested by the application:
  screen

Either request fewer slots for your application, or make more slots available
for use.

如果我尝试告诉它只使用 1 个屏幕,mpiexec 会失败

If I try and tell it to just use 1 screen, mpiexec fails

mpiexec -N 16 --host 10.0.0.3 -np 1 screen -oversubscribe batSRTest3 shortpass.bat
--------------------------------------------------------------------------
mpiexec was unable to find the specified executable file, and therefore
did not launch the job.  This error was first reported for process
rank 0; it may have occurred for other processes as well.

NOTE: A common cause for this error is misspelling a mpiexec command
      line parameter option (remember that mpiexec interprets the first
      unrecognized command line token as the executable).

Node:       node1
Executable: screen

我查看了 openMPI 常见问题解答,但该信息不适用于远程访问.我尝试遵循 this 部分,但是当我输入

I have looked at the openMPI FAQ but the information does not apply for remote access. I tried following this part but when I type in

gdb --pid

代码运行后什么也没有发生.该部分中的方法 2 也将不起作用,因为在使用 Putty 访问 PI 时我无法打开多个窗口.

with the code running nothing happens. Method 2 in that section also will not work as I cannot open multiple windows when accessing the PIs using Putty.

我希望能够在理想情况下在所有节点上运行时对其进行调试,目前要运行我必须使用的程序:

I want to be able to debug it when running on all of the nodes ideally, and currently to run my program I have to use:

$ mpiexec -N 4 --host 10.0.0.3,10.0.0.4,10.0.0.5,10.0.0.6 -oversubscribe batSRTest shortpass.bat

这也引起了混乱,因为我什至不确定我是否正确添加了额外的参数.

Which is also causing confusion as I'm not even sure I am adding in the extra arguments correctly.

我确实尝试使用类似于共享答案的 gdb 进行调试 here 但这只是导致 MPI 失败,因为它没有被赋予多项任务.

I did try debugging using gdb similiar to the answer shared here but that just resulted in MPI failing since it wasn't given multiple tasks.

(gdb) exec-file batSRTest3
(gdb) run
Starting program: /home/pi/progs/batSRTest3 mpiexec -N 16 --host 10.0.0.3 -oversubscribe batSRTest3 shortpass.bat
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[Detaching after fork from child process 17157]
[New Thread 0x7691a460 (LWP 17162)]
[New Thread 0x75d3d460 (LWP 17163)]
[node1:17153] *** An error occurred in MPI_Group_incl
[node1:17153] *** reported by process [141361153,0]
[node1:17153] *** on communicator MPI_COMM_WORLD
[node1:17153] *** MPI_ERR_RANK: invalid rank
[node1:17153] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node1:17153] ***    and potentially your MPI job)
[Thread 0x7691a460 (LWP 17162) exited]
[Thread 0x76ff5010 (LWP 17153) exited]
[Inferior 1 (process 17153) exited with code 06]
(gdb) q

推荐答案

调试 MPI 应用程序的问题在于它们以多进程的形式运行,而您通常无法直接访问这些进程.因此,存在能够将自身集成到 MPI 作业中的特殊并行调试器.最受欢迎的两个是 TotalView 和 Arm DDT(以前称为 Allinea DDT).两者都是昂贵的商业产品,但许多学术机构购买许可证,因此请检查您的情况是否如此.穷人的解决方案是使用GDB,它本身不是并行调试器,因此必须发挥创造力.

The problem with debugging MPI applications is that they run in the form of multiple process and often you do not have direct access to those processes. Therefore, special parallel debuggers exist that are able to integrate themselves inside the MPI job. The two most popular ones are TotalView and Arm DDT (formerly known as Allinea DDT). Both are expensive commercial products, but many academic institutions buy licenses, so check if that is the case with yours. The poor man's solution is to use GDB, which is not a parallel debugger per se, so one has to get creative.

简而言之,我们的想法是在 GDB 的监督下启动您的 MPI 流程.但首先,让我们看看 Open MPI 如何在多个节点上执行作业.下图应该可以说明它:

In a nutshell, the idea is to launch your MPI processes under GDB's supervision. But first, let's look at how Open MPI executes a job on multiple nodes. The following diagram should illustrate it:

mpiexec <--+--> orted on node1 <--+--> rank 0
           |                      |
           |                      +--- rank 1
           |                      :
           |                      +--- rank N-1
           |
           +--- orted on node2 <--+--- rank N
           |                      |
           |                      +--- rank N+1
           |                      :
           :                      +--- rank 2N-1

mpiexec 是 MPI 程序启动器,负责读取 MPI 等级数、主机列表、绑定策略等信息,并使用这些信息启动作业.对于与执行 mpiexec 的主机位于同一主机上的进程,它只是多次生成可执行文件.对于远程节点上的进程,它使用 RSH、SSH 或某种其他机制(srun 用于 SLURM、TM2 等)在每个远程主机上启动 orted 帮助程序,然后根据需要在其特定主机上生成尽可能多的等级.

mpiexec is the MPI program launcher, which is responsible for reading in information such as the number of MPI ranks, host lists, binding policy, etc., and using that information to launch the job. For processes on the same host as the one where mpiexec was executed, it simply spawns the executable a number of times. For processes on remote nodes, it uses RSH, SSH, or some other mechanism (srun for SLURM, TM2, etc.) to start on each remote host the orted helper program, which then spawns as many ranks on its particular host as necessary.

与常规 Unix 程序不同,您永远不会通过控制台或 Unix 信号直接与 MPI 进程交互.相反,MPI 运行时提供 I/O 转发和信号传播机制.您与 mpiexec 的标准输入和输出交互,然后它使用一些基础设施将您的输入发送到 0 级,并向您显示从所有级收到的输出.类似地,发送到 mpiexec 的信号被转换并传播到 MPI 等级.MPI 标准中没有完全指定 I/O 重定向和信号传播,因为它们非常特定于平台,但是一般的集群实现共识是所有等级的标准输出都转发到 mpiexec<的标准输出/code> 而只有等级 0 从标准输入接收;其余的 rank 将他们的标准输入连接到 /dev/null.这在上图中用有向箭头表示.实际上,Open MPI 允许您通过将 --stdin rank 传递给 mpiexec 来选择接收标准输入的等级.

Unlike regular Unix programs, you never interact directly with the MPI processes through the console or via Unix signals. Instead, the MPI runtime provides mechanisms for I/O forwarding and signal propagation. You interract with the standard input and output of mpiexec, which then uses some infrastructure to send your input to rank 0 and to show you the output received from all ranks. Similarly, signals sent to mpiexec are translated and propagated to the MPI ranks. Neither I/O redirection nor signal propagation are fully specified in the MPI standard, because they are very platform-specific, but the general cluster implementation consensus is that the standard output of all ranks gets forwarded to the standard output of mpiexec while only rank 0 receives from the standard input; the rest of the ranks have their standard input connected to /dev/null. This is shown with directed arrows on the diagram above. Actually, Open MPI allows you to select which rank will receive the standard input by passing --stdin rank to mpiexec.

如果您执行 gdb mpiexec ...,则您不是在调试 MPI 应用程序.相反,您将调试 MPI 启动器本身,它不会运行您的代码.你需要在MPI运行时和MPI排名之间插入GDB,即上图应该变成:

If you do gdb mpiexec ..., you are not debugging the MPI application. Instead, you will be debugging the MPI launcher itself, which isn't running your code. You need to interpose GDB between the MPI runtime and the MPI ranks themselves, i.e., the above diagram should transform into:

mpiexec <--+--> orted on node1 <--+--> gdb <---> rank 0
           |                      |
           |                      +--- gdb <---> rank 1
           |                      :
           |                      +--- gdb <---> rank N-1
           |
           +--- orted on node2 <--+--- gdb <---> rank N
           |                      |
           |                      +--- gdb <---> rank N+1
           |                      :
           :                      +--- gdb <---> rank 2N-1

现在的问题变成了如何与大量 GDB 实例进行交互,主要是因为您只能直接与其中一个进行对话.使用 TotalView 和 DDT,有一个 GUI 可以使用网络套接字与调试器组件对话,因此解决了这个问题.对于许多 GDB,您有多种选择(或者更确切地说,是 hacks).

The problem now becomes how to interact with that multitude of GDB instances, mostly because you can directly talk to only one of them. With TotalView and DDT, there is a GUI that talks to the debugger components using network sockets, so this problem is solved. With many GDBs, you have a couple of options (or rather, hacks).

第一个选项是只调试一个行为不端的 MPI 等级.如果错误总是发生在同一个级别,你可以让它在 GDB 的控制下运行,其余的自己运行,然后使用 --stdin rank 告诉 mpiexec 让您在等级不为 0 时与调试器交互.您需要一个简单的包装脚本(称为 debug_rank.sh):

First option is to only debug a single misbehaving MPI rank. If the error always occurs in one and the same rank, you can have it run under the control of GDB while the rest run on their own and then use --stdin rank to tell mpiexec to let you interact with the debugger if the rank is not 0. You need a simple wrapper script (called debug_rank.sh):

#!/bin/sh
# Usage: debug_rank.sh <rank to debug> <executable> <arguments>

DEBUG_RANK=$1
shift
if [ $OMPI_COMM_WORLD_RANK == $DEBUG_RANK ]; then
   exec gdb -ex=run --args $*
else
   exec $*
fi

-ex=run 告诉 GDB 在加载可执行文件后自动执行 run 命令.如果您需要先设置断点,则可以省略它.使用这样的包装器,例如调试等级 3:

The -ex=run tells GDB to automatically execute the run command after loading the executable. You may omit it if you need to set breakpoints first. Use the wrapper like this, for example to debug rank 3:

$ mpiexec ... --stdin 3 ./debug_rank.sh 3 batSRTest shortpass.bat

一旦第 3 级做了坏事或到达断点,您将进入 GDB 命令提示符.您也可以不使用包装器脚本并直接运行 gdb,希望它不会在您希望调试的级别之外的任何其他级别上进入其命令提示符.如果发生这种情况,GDB 将退出,因为它的标准输入将连接到 /dev/null,从而降低整个 MPI 作业,因为 mpiexec 会注意到一个等级退出而不调用 <代码>MPI_Finalize().

Once rank 3 does something bad or reaches a breakpoint, you'll be dropped into the GDB command prompt. You can also go without the wrapper script and run gdb directly, hoping that it won't drop into its command prompt on any other rank than the one you expect to be debugging. If that happens, GDB will exit because its standard input will be connected to /dev/null, bringing down the whole MPI job because mpiexec will notice one rank exiting without calling MPI_Finalize().

如果您不知道哪个特定等级的行为不正常,或者它是否因运行而异,或者如果您想在多个其中设置断点,那么您需要解决输入重定向问题.而最简单"的解决方案是使用 X11 终端模拟器,例如 xterm.这里的技巧是 GUI 程序从窗口系统而不是标准输入中获取输入,因此您可以愉快地输入并将输入发送到在 xterm 中运行的命令,尽管它的标准输入连接到 <代码>/dev/null.此外,X11 是一种客户端/服务器协议,可以在 TCP/IP 上运行,允许您远程运行 xterm 并在运行某些 X11 实现(例如 X.org 或 XWayland)时将其显示在本地系统上.这正是 Open MPI 页面上显示的命令的作用:

If you don't know which particular rank misbehaves, or if it varies from run to run, or if you want to set breakpoints in more than one of them, then you need to get around the input redirection problem. And the "simplest" solution is to use X11 terminal emulators such as xterm. The trick here is that GUI programs get their input from the windowing system and not from the standard input, so you can happily type in and send input to commands running inside xterm despite its standard input being connected to /dev/null. Also, X11 is a client/server protocol that can run over TCP/IP, allowing you to run xterm remotely and have it displayed on your local system when running some X11 implementation such as X.org or XWayland. That's exactly what the command shown on the Open MPI page does:

$ mpiexec ... xterm -e gdb -ex=run --args batSRTest shortpass.bat

这会启动多个 xterm 副本,每个副本都执行 gdb -ex=run --args batSRTest shortpass.bat.因此,您可以在自己的终端窗口中获​​得许多 GDB 实例,这允许您与其中的任何一个或所有实例进行交互.为此,您需要做几件事:

This starts many copies of xterm and each copy executes gdb -ex=run --args batSRTest shortpass.bat. So you get many instances of GDB in their own terminal windows, which allows you to interact with any and all of them. For this to work, you need a couple of things:

  • 应该在每个 Pi 上安装 xterm 的副本
  • 您的网络应该是低延迟的,因为 X11 协议在延迟较长的网络上运行速度非常慢
  • 您的 X11 服务器应该可以从所有 Pi 访问,并且应该配置为接受来自它们的连接
  • 应该相应地设置DISPLAY环境变量
  • there should be a copy of xterm installed on each Pi
  • your network should be a low-latency one because the X11 protocol runs terribly slow on networks with longer delays
  • your X11 server should be reachable from all of the Pis and should be configured to accept connections from them
  • the DISPLAY environment variable should be set accordingly

任何 X11 客户端应用程序(例如 xterm)都使用 DISPLAY 环境变量中的值来确定如何连接到 X11 服务器.它的值具有一般形式:[.].对于管理单个显示的本地服务器,DISPLAY 通常是 :0.0 甚至只是 :0.当 缺失时,隐含特殊值 host/unix,这意味着 X11 服务器正在侦听位于 中的 Unix 域套接字/tmp/.X11-unix/.默认情况下,出于安全原因,X11 服务器仅侦听 Unix 域套接字,这使得网络客户端无法访问它们.您需要启用对 TCP/IP 套接字的监听并覆盖绑定地址,默认情况下为 127.0.0.1,并确保您的主机可从 Pis 访问,即,他们可以直接连接到 X11 服务器侦听的 TCP 端口上的 IP 地址.如果你这样做,那么它的工作原理是这样的:

Any X11 client application such as xterm uses the value in the DISPLAY environment variable to determine how to connect to the X11 server. Its value has the general form <optional hostname>:<display>[.<screen>]. For local servers managing a single display, DISPLAY is usually :0.0 or even just :0. When <optional hostname> is missing, the special value host/unix is implied, which means that the X11 server is listening on a Unix domain socket located in /tmp/.X11-unix/. By default, for security reasons X11 servers only listen on Unix domain sockets, which makes them unreachable for network clients. You need to enable listening on a TCP/IP socket and override the bind address, which is 127.0.0.1 by default, and to make sure your host is reachable from the Pis, i.e., that they can directly connect to your IP address on the TCP port that the X11 server listens on. If you go this way, then it works like this:

  1. 为 X11 启用 TCP 连接并使其侦听网络接口
  2. 检查系统上 DISPLAY 的值
  3. 预先填写您的 IP 地址
  4. 像这样运行 MPI 作业:

$ mpiexec ... -x DISPLAY=your.ip:d.s xterm -e gdb -ex=run --args batSRTest shortpass.bat

其中 d.s 是本地 DISPLAY 变量设置为的显示和屏幕值.确保您的防火墙允许端口 6000+d 上的入站 TCP 连接.

where d.s are the display and screen values that your local DISPLAY variable is set to. Make sure your firewall allows inbound TCP connections on port 6000+d.

启用来自网络的 TCP 连接并不总是可取的,甚至是不可能的,尤其是当您使用 NAT 时.因此,另一种解决方案是通过 SSH 使用 X11 转发.为此,您需要在连接到 SSH 服务器时将 -X-Y 传递给 SSH 客户端:

Enabling TCP connections from the network is not always advisable or even possible, especially if you are behind NAT. Therefore, an alternative solution is to use X11 forwarding over SSH. For that, you need to pass -X or -Y to the SSH client when connecting to the SSH server:

 $ ssh -X username@server

-Y 而不是 -X 启用一些不受信任的扩展,并且某些 X11 应用程序可能需要这些扩展.X11 转发仅在服务器端启用时才有效.它还需要在服务器上安装 xauth.但仅仅在服务器上启用 X11 转发是不够的,因为默认情况下 SSH 服务器将在环回接口上侦听 X11 连接以进行转发.对于 OpenSSH,必须相应地设置以下两个配置参数:

-Y instead of -X enables some untrusted extensions and may be required for some X11 applications. X11 forwarding only works if enabled on the server side. It also needs that xauth is installed on the server. But simply enabling X11 forwarding on the server is not enough since by default the SSH server will listen on the loopback interface for X11 connections to forward. For OpenSSH, the following two configuration parameters must be set accordingly:

X11Forwarding yes    # Enable X11 forwarding
X11UseLocalhost no   # Listen on all network interfaces

如果 SSH 服务器配置正确并且存在 xauth 命令,当您通过 SSH 进入系统时,DISPLAY 的值应该类似于 hostname:10.0 并运行 netstat -an |grep 6010 应该产生这样的结果:

If the SSH server is configured correctly and the xauth command is present, when you SSH into the system the value of DISPLAY should be something like hostname:10.0 and running netstat -an | grep 6010 should produce something like this:

tcp        0      0 0.0.0.0:6010            0.0.0.0:*               LISTEN
tcp6       0      0 :::6010                 :::*                    LISTEN

表示X11转发套接字绑定到所有网络接口.然后,您应该像这样启动 MPI 作业:

indicating that the X11 forwarding sockets are bound to all network interfaces. You should then launch the MPI job like this:

$ mpiexec -x DISPLAY=server.ip:10.0 xterm -e gdb -ex=run --args batSRTest shortpass.bat

其中 server.ip 是服务器在将其连接到 Pis 的网络中的 IP(我怀疑在您的情况下是 10.0.0.1).此外,应该在服务器的防火墙中启用一系列以 6010 开头的 TCP 端口.实际值取决于有多少 X11 转发会话.默认情况下,X11DisplayOffset 设置为 10,因此 SSH 服务器将从显示 10 开始,直到找到未分配的显示编号.此外,如果您在 Pis 上的主目录未以某种方式与服务器上的主目录共享(例如,通过 NFS 挂载),您还需要复制在服务器主目录中找到的 .Xauthority 文件到所有 Pis 上的主目录.该文件包含使用 X11 转发器进行身份验证所需的 MIT 魔法 cookie,并且每次通过 SSH 连接到启用 X11 转发的服务器时都会重新生成,因此请确保在每次 SSH 登录后再次将其复制到所有 Pi.

where server.ip is the IP the server has in the network that connects it to the Pis (I suspect that would be 10.0.0.1 in your case). Also, a range of TCP ports starting with 6010 should be enabled in the server's firewall. The actual value depends on how many X11 forwarding sessions there are. By default, X11DisplayOffset is set to 10, so the SSH server will start with display 10 and go up until an unallocated display number is found. Also, if your home directory on the Pis is not somehow shared with that on the server (e.g., via NFS mounts), you also need to copy the .Xauthority file found in your home directory on the server to your home directory on all Pis. This file contains the MIT magic cookie needed to authenticate with the X11 forwarder and is regenerated each time you SSH into the server with X11 forwarding enabled, so make sure to copy it again to all Pis after each SSH login.

现在,如果这一切看起来过于复杂,GDB 还具有远程调试功能.您可以在服务器上启动 GDB,然后在 GDB 服务器程序 gdbserver 的监督下运行远程 MPI 进程,然后使用本地 GDB 中的远程调试命令连接到其中一台 GDB 服务器.这是相当麻烦的.您需要告诉每个 GDB 服务器侦听不同的端口.包装脚本 (debug_server.sh) 可能会有所帮助:

Now, if all this is seems overly complex, GDB also has remote debugging abilities. You can start GDB on the server and then run remote MPI processes under the supervision of the GDB server program gdbserver, then use the remote debugging commands in the local GDB to connect to one of the GDB servers. This is quite cumbersome. You need to tell each GDB server to listen on a different port. A wrapper script (debug_server.sh) may help:

#!/bin/sh
# Usage: debug_server.sh <executable> <arguments>

GDB_HOST=$(hostname)
GDB_PORT=$(( 60000 + $OMPI_COMM_WORLD_RANK ))
echo "GDB server for rank $OMPI_COMM_WORLD_RANK available on $GDB_HOST:$GDB_PORT"
exec gdbserver :$GDB_PORT $*

像这样运行:

$ mpiexec ... ./debug_server.sh batSRTest shortpass.bat

它将打印不同 GDB 服务器实例正在侦听的主机名和端口列表.不带参数启动 GDB 并发出以下命令:

It will print the list of hostnames and ports that the different GDB server instances are listening on. Fire GDB with no arguments and issue the following command:

(gdb) target remote hostname:port

其中 hostnameport 是感兴趣的 Pi 的 IP(或主机名,如果可解析).GDB 服务器会在可执行文件的入口点自动中断,该入口点很可能位于动态链接器中的某个位置,您需要发出 continue 命令以使其运行.您需要为每个 GDB 服务器实例执行此操作,但我不知道如何在不停止当前目标的情况下断开与当前目标的连接,因此您可能还需要启动大量 GDB.

where hostname and port are the IP (or hostname, if resolvable) of the Pi of interest. GDB server automatically breaks on the entry point of the executable, which will most likely be somewhere in the dynamic linker, and you need to issue the continue command to make it run. You need to do that for each GDB server instance and I don't know a way to disconnect from the current target without stoping it, so you may need to start a load of GDBs too.

GDB 可能有一些 GUI 可以简化这一点.您可以查看 Eclipse PTP 项目,它提供了一个并行调试器,看看它是否适用于你.您可能会发现这些幻灯片很有用.我个人从未使用过 PTP,也不知道它能做什么.

There may be some GUI to GDB that simplifies this. You may look into the Eclipse PTP project, which provides a parallel debugger, and see whether it works for you. You may find these slides useful. I personally have never used PTP and have no idea what it can do.

这基本上就是为什么除了最复杂的情​​况外,大多数 MPI 调试都是使用 printf() 完成的.将 --tag-output 添加到 mpiexec 参数列表,使其在每个输出行前面加上作业 ID 和它来自的等级 ID,这样您就不必自己打印这些信息.

That's basically why most MPI debugging is done using printf() except for the most convoluted cases. Add --tag-output to the list of mpiexec arguments to make it prefix each output line with the job ID and rank ID it comes from so you don't have to print that information yourself.

这篇关于使用 GDB 远程调试 MPI的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆