multiprocessing.pool.MaybeEncodingError:'TypeError(“无法序列化'_io.BufferedReader'对象",)' [英] multiprocessing.pool.MaybeEncodingError: 'TypeError("cannot serialize '_io.BufferedReader' object",)'

查看:75
本文介绍了multiprocessing.pool.MaybeEncodingError:'TypeError(“无法序列化'_io.BufferedReader'对象",)'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么下面的代码仅适用于multiprocessing.dummy,而不适用于简单的multiprocessing.

Why does the code below work only with multiprocessing.dummy, but not with simple multiprocessing.

import urllib.request
#from multiprocessing.dummy import Pool #this works
from multiprocessing import Pool

urls = ['http://www.python.org', 'http://www.yahoo.com','http://www.scala.org', 'http://www.google.com']

if __name__ == '__main__':
    with Pool(5) as p:
        results = p.map(urllib.request.urlopen, urls)

错误:

Traceback (most recent call last):
  File "urlthreads.py", line 31, in <module>
    results = p.map(urllib.request.urlopen, urls)
  File "C:\Users\patri\Anaconda3\lib\multiprocessing\pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\patri\Anaconda3\lib\multiprocessing\pool.py", line 657, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<http.client.HTTPResponse object at 0x0000016AEF204198>]'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object")'

缺少什么,这样就可以在没有虚拟"的情况下工作?

What's missing so that it works without "dummy" ?

推荐答案

urlopen()取回的http.client.HTTPResponse对象具有附加的_io.BufferedReader对象,因此不能对其进行腌制.

The http.client.HTTPResponse-object you get back from urlopen() has a _io.BufferedReader-object attached, and this object cannot be pickled.

pickle.dumps(urllib.request.urlopen('http://www.python.org').fp)
Traceback (most recent call last):
...
    pickle.dumps(urllib.request.urlopen('http://www.python.org').fp)
TypeError: cannot serialize '_io.BufferedReader' object

multiprocessing.Pool将需要腌制(序列化)结果以将其发送回父进程,这在此处失败.由于dummy使用线程而不是进程,因此不会出现任何酸洗,因为同一进程中的线程会自然地共享其内存.

multiprocessing.Pool will need to pickle (serialize) the results to send it back to the parent process and this fails here. Since dummy uses threads instead of processes, there will be no pickling, because threads in the same process share their memory naturally.

TypeError的一般解决方案是:

  1. 读出缓冲区并保存内容(如果需要)
  2. 从您要腌制的对象中删除对'_io.BufferedReader'的引用
  1. read out the buffer and save the content (if needed)
  2. remove the reference to '_io.BufferedReader' from the object you are trying to pickle

在您的情况下,在http.client.HTTPResponse上调用.read()将会清空并删除缓冲区,因此将响应转换为可腌制东西的函数可以简单地做到这一点:

In your case, calling .read() on the http.client.HTTPResponse will empty and remove the buffer, so a function for converting the response into something pickleable could simply do this:

def read_buffer(response):
    response.text = response.read()
    return response

示例:

r = urllib.request.urlopen('http://www.python.org')
r = read_buffer(r)
pickle.dumps(r)
# Out: b'\x80\x03chttp.client\nHTTPResponse\...

在考虑使用这种方法之前,请确保您确实要使用多处理而不是多线程.对于像您在这里这样的受I/O约束的任务,多线程就足够了,因为无论如何大多数时间都花在等待响应上(不需要cpu时间).多处理和涉及的IPC也会带来大量开销.

Before you consider this approach, make sure you really want to use multiprocessing instead of multithreading. For I/O-bound tasks like you have it here, multithreading would be sufficient, since most of the time is spend in waiting (no need for cpu-time) for the response anyway. Multiprocessing and the IPC involved also introduces substantial overhead.

这篇关于multiprocessing.pool.MaybeEncodingError:'TypeError(“无法序列化'_io.BufferedReader'对象",)'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆