Python多处理进程无提示崩溃 [英] Python multiprocessing Process crashes silently

查看:160
本文介绍了Python多处理进程无提示崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python 2.7.3.我已经使用子类对象并行化了一些代码.如果我的子类Process对象中的代码中没有错误,则一切运行正常.但是,如果我的子类Process对象中的代码中有错误,则它们显然将以静默方式崩溃(没有将stacktrace打印到父shell),并且CPU使用率将降为零.父代码从不崩溃,给人的印象是执行只是挂起.同时,很难找到代码中的错误在哪里,因为没有给出关于错误在哪里的指示.

I'm using Python 2.7.3. I have parallelised some code using subclassed multiprocessing.Process objects. If there are no errors in the code in my subclassed Process objects, everything runs fine. But if there are errors in the code in my subclassed Process objects, they will apparently crash silently (no stacktrace printed to the parent shell) and CPU usage will drop to zero. The parent code never crashes, giving the impression that execution is just hanging. Meanwhile it's really difficult to track down where the error in the code is because no indication is given as to where the error is.

我在stackoverflow上找不到其他任何可解决相同问题的问题.

I can't find any other questions on stackoverflow that deal with the same problem.

我猜想子类化的Process对象似乎会无声地崩溃,因为它们无法将错误消息打印到父级的shell中,但是我想知道我能做些什么,以便至少可以更高效地进行调试(并且这样我的代码的其他用户也可以在遇到问题时告诉我.

I guess the subclassed Process objects appear to crash silently because they can't print an error message to the parent's shell, but I would like to know what I can do about it so that I can at least debug more efficiently (and so that other users of my code can tell me when they run into problems too).

我的实际代码太复杂了,但是一个带有错误的子类Process对象的简单示例将是这样的:

my actual code is too complex, but a trivial example of a subclassed Process object with an error in it would be something like this:

from multiprocessing import Process, Queue

class Worker(Process):

    def __init__(self, inputQueue, outputQueue):

        super(Worker, self).__init__()

        self.inputQueue = inputQueue
        self.outputQueue = outputQueue

    def run(self):

        for i in iter(self.inputQueue.get, 'STOP'):

            # (code that does stuff)

            1 / 0 # Dumb error

            # (more code that does stuff)

            self.outputQueue.put(result)

推荐答案

您真正想要的是某种将异常传递给父进程的方法,对吗?然后,您可以根据需要处理它们.

What you really want is some way to pass exceptions up to the parent process, right? Then you can handle them however you want.

如果您使用 concurrent.futures.ProcessPoolExecutor ,这是自动的.如果您使用 multiprocessing.Pool ,这很简单.如果您使用显式的ProcessQueue,则需要做一些工作,但是 并不多.

If you use concurrent.futures.ProcessPoolExecutor, this is automatic. If you use multiprocessing.Pool, it's trivial. If you use explicit Process and Queue, you have to do a bit of work, but it's not that much.

例如:

def run(self):
    try:
        for i in iter(self.inputQueue.get, 'STOP'):
            # (code that does stuff)
            1 / 0 # Dumb error
            # (more code that does stuff)
            self.outputQueue.put(result)
    except Exception as e:
        self.outputQueue.put(e)

然后,您的调用代码可以像读取其他任何内容一样从队列中读取Exception.代替这个:

Then, your calling code can just read Exceptions off the queue like anything else. Instead of this:

yield outq.pop()

执行以下操作:

result = outq.pop()
if isinstance(result, Exception):
    raise result
yield result

(我不知道您实际的父进程队列读取代码是做什么的,因为您的最小样本只是忽略了队列.但是希望这可以解释这个想法,即使您的实际代码实际上并非如此. )

(I don't know what your actual parent-process queue-reading code does, because your minimal sample just ignores the queue. But hopefully this explains the idea, even though your real code doesn't actually work like this.)

这假设您要在任何未处理的异常中终止,从而使异常最多达到run.如果要传递异常并继续到下一个i in iter,只需将try移到for中,而不要绕在其周围即可.

This assumes that you want to abort on any unhandled exception that makes it up to run. If you want to pass back the exception and continue on to the next i in iter, just move the try into the for, instead of around it.

这还假设Exception是无效值.如果这是一个问题,最简单的解决方案是仅推送(result, exception)元组:

This also assumes that Exceptions are not valid values. If that's an issue, the simplest solution is to just push (result, exception) tuples:

def run(self):
    try:
        for i in iter(self.inputQueue.get, 'STOP'):
            # (code that does stuff)
            1 / 0 # Dumb error
            # (more code that does stuff)
            self.outputQueue.put((result, None))
    except Exception as e:
        self.outputQueue.put((None, e))

然后,您的弹出代码执行以下操作:

Then, your popping code does this:

result, exception = outq.pop()
if exception:
    raise exception
yield result

您可能会注意到,这与node.js回调样式相似,在该样式中,您将(err, result)传递给每个回调.是的,这很烦人,您将以这种方式弄乱代码.但是,除了包装器之外,您实际上并没有在其他任何地方使用它.您所有在队列中获取值或在run内部调用的应用程序级"代码都只会看到正常的收益/收益和引发的异常.

You may notice that this is similar to the node.js callback style, where you pass (err, result) to every callback. Yes, it's annoying, and you're going to mess up code in that style. But you're not actually using that anywhere except in the wrapper; all of your "application-level" code that gets values off the queue or gets called inside run just sees normal returns/yields and raised exceptions.

即使您正在手动排队和执行工作,您甚至可能要考虑按照concurrent.futures的规范构建Future(或按原样使用该类).这并不难,它为您提供了一个非常不错的API,尤其是用于调试.

You may even want to consider building a Future to the spec of concurrent.futures (or using that class as-is), even though you're doing your job queuing and executing manually. It's not that hard, and it gives you a very nice API, especially for debugging.

最后,值得注意的是,即使您绝对确定每个队列只希望一个工作人员,使用执行程序/池设计也可以使围绕工作人员和队列构建的大多数代码变得简单得多.只需废弃所有样板文件,然后将Worker.run方法中的循环转换为一个函数(正常情况下只是return s或raise s,而不是附加到队列中).在调用方,再次废弃所有样板,仅废弃submitmap作业函数及其参数.

Finally, it's worth noting that most code built around workers and queues can be made a lot simpler with an executor/pool design, even if you're absolutely sure you only want one worker per queue. Just scrap all the boilerplate, and turn the loop in the Worker.run method into a function (which just returns or raises as normal, instead of appending to a queue). On the calling side, again scrap all the boilerplate and just submit or map the job function with its parameters.

您的整个示例可以简化为:

Your whole example can be reduced to:

def job(i):
    # (code that does stuff)
    1 / 0 # Dumb error
    # (more code that does stuff)
    return result

with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
    results = executor.map(job, range(10))

它会自动正确处理异常.

And it'll automatically handle exceptions properly.

正如您在注释中提到的那样,对异常的追溯不会追溯到子进程中;而是可以追溯到子进程.它只会到达手动raise result调用(或者,如果您使用的是池或执行程序,则为池或执行程序的胆量).

As you mentioned in the comments, the traceback for an exception doesn't trace back into the child process; it only goes as far as the manual raise result call (or, if you're using a pool or executor, the guts of the pool or executor).

原因是multiprocessing.Queue建立在pickle之上,并且腌制异常不会腌制其回溯.这样做的原因是您不能腌制回溯.这样做的原因是,回溯中充满了对本地执行上下文的引用,因此要使其在另一个进程中工作将非常困难.

The reason is that multiprocessing.Queue is built on top of pickle, and pickling exceptions doesn't pickle their tracebacks. And the reason for that is that you can't pickle tracebacks. And the reason for that is that tracebacks are full of references to the local execution context, so making them work in another process would be very hard.

那么……您能对此做些什么?不要去寻找一个完全通用的解决方案.相反,请考虑您的实际需求. 90%的时间,您想要的是记录具有追溯的异常,然后继续"或将具有追溯的异常打印到stderrexit(1),就像默认的未处理的异常处理程序一样".对于这两种情况,您都不需要传递任何异常.只需在子级进行格式化,然后传递一个字符串即可.如果您需要,请准确地计算出您需要的内容,并传递足够的信息以手动将其组合在一起.如果您不知道如何设置回溯和异常的格式,请参见 traceback 模块.很简单这意味着您根本不需要进入腌制机械. (并不是说很难用copyreg腌制机或用__reduce__方法或其他方法编写一个holder类,但是如果不需要,为什么要学习所有这些?)

So… what can you do about this? Don't go looking for a fully general solution. Instead, think about what you actually need. 90% of the time, what you want is "log the exception, with traceback, and continue" or "print the exception, with traceback, to stderr and exit(1) like the default unhandled-exception handler". For either of those, you don't need to pass an exception at all; just format it on the child side and pass a string over. If you do need something more fancy, work out exactly what you need, and pass just enough information to manually put that together. If you don't know how to format tracebacks and exceptions, see the traceback module. It's pretty simple. And this means you don't need to get into the pickle machinery at all. (Not that it's very hard to copyreg a pickler or write a holder class with a __reduce__ method or anything, but if you don't need to, why learn all that?)

这篇关于Python多处理进程无提示崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆