调试并行Python程序(mpi4py) [英] Debugging parallel Python programs (mpi4py)
问题描述
我有一个 mpi4py
程序,它会间歇性地挂起.我如何跟踪各个流程在做什么?
I have an mpi4py
program that hangs intermittently. How can I trace what the individual processes are doing?
我可以在不同的终端上运行该程序,例如使用 pdb
I can run the program in different terminals, for example using pdb
mpiexec -n 4 xterm -e "python -m pdb my_program.py"
但是,如果仅通过大量进程(在我的情况下为〜80)显示问题,则此操作将变得很麻烦.另外,使用 pdb
捕获异常很容易,但是我需要查看跟踪以找出发生挂起的位置.
But this gets cumbersome if the issue only manifests with a large number of processes (~80 in my case). In addition, it's easy to catch exceptions with pdb
but I'd need to see the trace to figure out where the hang occurs.
推荐答案
Python 跟踪模块允许您跟踪程序执行.为了分别存储每个进程的跟踪,您需要将代码包装在一个函数中:
The Python trace module allows you to trace program execution. In order to store the trace of each process separately, you need to wrap your code in a function:
def my_program(*args, **kwargs):
# insert your code here
pass
然后使用 trace.Trace.runfunc
运行它:
import sys
import trace
# define Trace object: trace line numbers at runtime, exclude some modules
tracer = trace.Trace(
ignoredirs=[sys.prefix, sys.exec_prefix],
ignoremods=[
'inspect', 'contextlib', '_bootstrap',
'_weakrefset', 'abc', 'posixpath', 'genericpath', 'textwrap'
],
trace=1,
count=0)
# by default trace goes to stdout
# redirect to a different file for each processes
sys.stdout = open('trace_{:04d}.txt'.format(COMM_WORLD.rank), 'w')
tracer.runfunc(my_program)
现在,每个进程的跟踪信息都将写入单独的文件 trace_0001.txt
等中.使用 ignoredirs
和 ignoremods
参数忽略低级通话.
Now the trace of each process will be written in a separate file trace_0001.txt
etc. Use ignoredirs
and ignoremods
arguments to omit low level calls.
这篇关于调试并行Python程序(mpi4py)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!