为什么 ProcessPoolExecutor 和 Pool 在 Python 中调用 super() 会崩溃? [英] Why do ProcessPoolExecutor and Pool crash with a super() call in Python?

查看:104
本文介绍了为什么 ProcessPoolExecutor 和 Pool 在 Python 中调用 super() 会崩溃?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

1.为什么以下使用 concurrent.futures 模块的 Python 代码会永远挂起?

1. Why does the following Python code using the concurrent.futures module hang forever?

import concurrent.futures


class A:

    def f(self):
        print("called")


class B(A):

    def f(self):
        executor = concurrent.futures.ProcessPoolExecutor(max_workers=2)
        executor.submit(super().f)


if __name__ == "__main__":
    B().f()

调用引发了一个不可见的异常 [Errno 24] Too many open files(要查看它,请将行 executor.submit(super().f) 替换为print(executor.submit(super().f).exception())).

The call raises an invisible exception [Errno 24] Too many open files (to see it, replace the line executor.submit(super().f) with print(executor.submit(super().f).exception())).

但是,将 ProcessPoolExecutor 替换为 ThreadPoolExecutor 会打印已调用"正如预期的那样.

However, replacing ProcessPoolExecutor with ThreadPoolExecutor prints "called" as expected.

2.为什么以下使用 multiprocessing.pool 模块的 Python 代码会引发异常 AssertionError: daemonic processes are not allowed to have children?

2. Why does the following Python code using the multiprocessing.pool module raise the exception AssertionError: daemonic processes are not allowed to have children?

import multiprocessing.pool


class A:

    def f(self):
        print("called")


class B(A):

    def f(self):
        pool = multiprocessing.pool.Pool(2)
        pool.apply(super().f)


if __name__ == "__main__":
    B().f()

但是,将 Pool 替换为 ThreadPool 会打印已调用"正如预期的那样.

However, replacing Pool with ThreadPool prints "called" as expected.

环境:CPython 3.7,MacOS 10.14.

Environment: CPython 3.7, MacOS 10.14.

推荐答案

concurrent.futures.ProcessPoolExecutormultiprocessing.pool.Pool 使用 multiprocessing.queues.Queue 将工作函数对象从调用者传递给工作进程,Queue 使用 pickle 模块进行序列化/反序列化,但未能正确处理绑定的方法对象子类实例:

concurrent.futures.ProcessPoolExecutor and multiprocessing.pool.Pool uses multiprocessing.queues.Queue to pass the work function object from caller to worker process, Queue uses pickle module to serialize/unserialize, but it failed to proper processing bound method object with child class instance:

f = super().f
print(f)
pf = pickle.loads(pickle.dumps(f))
print(pf)

输出:

<bound method A.f of <__main__.B object at 0x104b24da0>>
<bound method B.f of <__main__.B object at 0x104cfab38>>

Af 变成了 Bf,这有效地在工作进程中创建了无限递归调用 BfBf.

A.f becomes B.f, this effectly creates infinite recursive calling B.f to B.f in the worker process.

pickle.dumps 利用绑定方法对象的 __reduce__ 方法,IMO,它的实现,没有考虑这个场景,没有照顾到真正的func对象,但仅尝试从具有简单名称 (f) 的实例 self obj (B()) 返回,结果 Bf,很可能是一个错误.

pickle.dumps utilize __reduce__ method of bound method object, IMO, its implementation, has no consideration of this scenario, which does not take care of the real func object, but only try to get back from instance self obj (B()) with the simple name (f), which resulting B.f, very likely a bug.

好消息是,我们知道问题出在哪里,我们可以通过实现我们自己的归约函数来解决它,该函数试图从原始函数 (Af) 和实例 obj 重新创建绑定方法对象(B()):

good news is, as we know where the issue is, we could fix it by implementing our own reduction function that tries to recreate the bound method object from the original function (A.f) and instance obj (B()):

import types
import copyreg
import multiprocessing

def my_reduce(obj):
    return (obj.__func__.__get__, (obj.__self__,))

copyreg.pickle(types.MethodType, my_reduce)
multiprocessing.reduction.register(types.MethodType, my_reduce)

我们可以这样做,因为绑定方法是一个描述符.

we could do this because bound method is a descriptor.

ps:我已提交错误报告.

这篇关于为什么 ProcessPoolExecutor 和 Pool 在 Python 中调用 super() 会崩溃?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆