调试并行Python程序(mpi4py) [英] Debugging parallel Python programs (mpi4py)

查看:82
本文介绍了调试并行Python程序(mpi4py)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 mpi4py 程序,它会间歇性地挂起.我如何跟踪各个流程在做什么?

I have an mpi4py program that hangs intermittently. How can I trace what the individual processes are doing?

我可以在不同的终端上运行该程序,例如使用 pdb

I can run the program in different terminals, for example using pdb

mpiexec -n 4 xterm -e "python -m pdb my_program.py"

但是,如果仅通过大量进程(在我的情况下为〜80)显示问题,则此操作将变得很麻烦.另外,使用 pdb 捕获异常很容易,但是我需要查看跟踪以找出发生挂起的位置.

But this gets cumbersome if the issue only manifests with a large number of processes (~80 in my case). In addition, it's easy to catch exceptions with pdb but I'd need to see the trace to figure out where the hang occurs.

推荐答案

Python 跟踪模块允许您跟踪程序执行.为了分别存储每个进程的跟踪,您需要将代码包装在一个函数中:

The Python trace module allows you to trace program execution. In order to store the trace of each process separately, you need to wrap your code in a function:

def my_program(*args, **kwargs):
    # insert your code here
    pass

然后使用 trace.Trace.runfunc 运行它:

import sys
import trace

# define Trace object: trace line numbers at runtime, exclude some modules
tracer = trace.Trace(
    ignoredirs=[sys.prefix, sys.exec_prefix],
    ignoremods=[
        'inspect', 'contextlib', '_bootstrap',
        '_weakrefset', 'abc', 'posixpath', 'genericpath', 'textwrap'
    ],
    trace=1,
    count=0)

# by default trace goes to stdout
# redirect to a different file for each processes
sys.stdout = open('trace_{:04d}.txt'.format(COMM_WORLD.rank), 'w')

tracer.runfunc(my_program)

现在,每个进程的跟踪信息都将写入单独的文件 trace_0001.txt 等中.使用 ignoredirs ignoremods 参数忽略低级通话.

Now the trace of each process will be written in a separate file trace_0001.txt etc. Use ignoredirs and ignoremods arguments to omit low level calls.

这篇关于调试并行Python程序(mpi4py)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆