multiprocessing.pool.MaybeEncodingError: 'TypeError("cannot serialize '_io.BufferedReader' object",)' [英] multiprocessing.pool.MaybeEncodingError: 'TypeError("cannot serialize '_io.BufferedReader' object",)'

查看:39
本文介绍了multiprocessing.pool.MaybeEncodingError: 'TypeError("cannot serialize '_io.BufferedReader' object",)'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么下面的代码只适用于multiprocessing.dummy,而不适用于简单的multiprocessing.

Why does the code below work only with multiprocessing.dummy, but not with simple multiprocessing.

import urllib.request
#from multiprocessing.dummy import Pool #this works
from multiprocessing import Pool

urls = ['http://www.python.org', 'http://www.yahoo.com','http://www.scala.org', 'http://www.google.com']

if __name__ == '__main__':
    with Pool(5) as p:
        results = p.map(urllib.request.urlopen, urls)

错误:

Traceback (most recent call last):
  File "urlthreads.py", line 31, in <module>
    results = p.map(urllib.request.urlopen, urls)
  File "C:UserspatriAnaconda3libmultiprocessingpool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:UserspatriAnaconda3libmultiprocessingpool.py", line 657, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<http.client.HTTPResponse object at 0x0000016AEF204198>]'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object")'

缺少什么才能在没有虚拟"的情况下工作?

What's missing so that it works without "dummy" ?

推荐答案

你从 urlopen() 得到的 http.client.HTTPResponse-object 有一个 >_io.BufferedReader - 附加对象,这个对象不能被pickle.

The http.client.HTTPResponse-object you get back from urlopen() has a _io.BufferedReader-object attached, and this object cannot be pickled.

pickle.dumps(urllib.request.urlopen('http://www.python.org').fp)
Traceback (most recent call last):
...
    pickle.dumps(urllib.request.urlopen('http://www.python.org').fp)
TypeError: cannot serialize '_io.BufferedReader' object

multiprocessing.Pool 将需要腌制(序列化)结果以将其发送回父进程,但此处失败.由于 dummy 使用线程而不是进程,因此不会出现酸洗,因为同一进程中的线程自然共享它们的内存.

multiprocessing.Pool will need to pickle (serialize) the results to send it back to the parent process and this fails here. Since dummy uses threads instead of processes, there will be no pickling, because threads in the same process share their memory naturally.

这个TypeError的一般解决方案是:

A general solution to this TypeError is:

  1. 读出缓冲区并保存内容(如果需要)
  2. 从您尝试腌制的对象中删除对 '_io.BufferedReader' 的引用

在您的情况下,在 http.client.HTTPResponse 上调用 .read() 将清空并删除缓冲区,因此是用于将响应转换为可腌制内容的函数可以这样做:

In your case, calling .read() on the http.client.HTTPResponse will empty and remove the buffer, so a function for converting the response into something pickleable could simply do this:

def read_buffer(response):
    response.text = response.read()
    return response

例子:

r = urllib.request.urlopen('http://www.python.org')
r = read_buffer(r)
pickle.dumps(r)
# Out: b'x80x03chttp.client
HTTPResponse...

在考虑这种方法之前,请确保您确实想要使用多处理而不是多线程.对于像您在此处拥有的 I/O 绑定任务,多线程就足够了,因为无论如何大部分时间都花在等待响应上(不需要 cpu 时间).多处理和所涉及的 IPC 也会带来大量开销.

Before you consider this approach, make sure you really want to use multiprocessing instead of multithreading. For I/O-bound tasks like you have it here, multithreading would be sufficient, since most of the time is spend in waiting (no need for cpu-time) for the response anyway. Multiprocessing and the IPC involved also introduces substantial overhead.

这篇关于multiprocessing.pool.MaybeEncodingError: 'TypeError("cannot serialize '_io.BufferedReader' object",)'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆