我可以在类的方法中使用multiprocessing.Pool吗? [英] Can I use multiprocessing.Pool in a method of a class?

查看:136
本文介绍了我可以在类的方法中使用multiprocessing.Pool吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在代码中使用multiprocessing以获得更好的性能.

I am tring to use multiprocessing in my code for better performance.

但是,我得到如下错误:

However, I got an error as follows:

Traceback (most recent call last):
  File "D:\EpubBuilder\TinyEpub.py", line 49, in <module>
    e.epub2txt()
  File "D:\EpubBuilder\TinyEpub.py", line 43, in epub2txt
    tempread = self.get_text()
  File "D:\EpubBuilder\TinyEpub.py", line 29, in get_text
    txtlist = pool.map(self.char2text,charlist)
  File "C:\Python34\lib\multiprocessing\pool.py", line 260, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Python34\lib\multiprocessing\pool.py", line 599, in get
    raise self._value
  File "C:\Python34\lib\multiprocessing\pool.py", line 383, in _handle_tasks
    put(task)
  File "C:\Python34\lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(ForkingPickler.dumps(obj))
  File "C:\Python34\lib\multiprocessing\reduction.py", line 50, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot serialize '_io.BufferedReader' object

我尝试了另一种方法并收到此错误:

I have tried it an other way and got this error:

TypeError: cannot serialize '_io.TextIOWrapper' object

我的代码如下:

from multiprocessing import Pool
class Book(object):
    def __init__(self, arg):
        self.namelist = arg
    def format_char(self,char):
        char = char + "a"
        return char
    def format_book(self):
        self.tempread = ""
        charlist = [f.read() for f in self.namelist] #list of char
        with Pool() as pool:
            txtlist = pool.map(self.format_char,charlist)
        self.tempread = "".join(txtlist)
        return self.tempread

if __name__ == '__main__':
    import os
    b = Book([open(f) for f in os.listdir()])
    t = b.format_book()
    print(t)

我认为由于未在主要功能中使用Pool而引发了错误.

I think that the error is raised because of not using the Pool in the main function.

我的猜想是对的吗?以及如何修改代码以修复错误?

Is my conjecture right? And how can I modify my code to fix the error?

推荐答案

问题是您在Book实例中有一个无法拾取的实例变量(namelist).因为您要在实例方法上调用pool.map,并且您正在Windows上运行,所以整个实例都必须是可腌制的,才能将其传递给子进程. Book.namelist是一个打开的文件对象(_io.BufferedReader),不能被腌制.您可以通过两种方法解决此问题.根据示例代码,您似乎可以将format_char设为顶级函数:

The issue is that you've got an unpicklable instance variable (namelist) in the Book instance. Because you're calling pool.map on an instance method, and you're running on Windows, the entire instance needs to be picklable in order for it to be passed to the child process. Book.namelist is a open file object (_io.BufferedReader), which can't be pickled. You can fix this a couple of ways. Based on the example code, it looks like you could just make format_char a top-level function:

def format_char(char):
    char = char + "a"
    return char


class Book(object):
    def __init__(self, arg):
        self.namelist = arg

    def format_book(self):
        self.tempread = ""
        charlist = [f.read() for f in self.namelist] #list of char
        with Pool() as pool:
            txtlist = pool.map(format_char,charlist)
        self.tempread = "".join(txtlist)
        return self.tempread

但是,实际上,如果您需要format_char作为实例方法,则可以使用 __getstate__/__setstate__ 通过在实例进行酸洗之前从实例中删除namelist自变量来使Book可腌制:

However, if in reality, you need format_char to be an instance method, you can use __getstate__/__setstate__ to make Book picklable, by removing the namelist argument from the instance before pickling it:

class Book(object):
    def __init__(self, arg):
        self.namelist = arg

    def __getstate__(self):
        """ This is called before pickling. """
        state = self.__dict__.copy()
        del state['namelist']
        return state

    def __setstate__(self, state):
        """ This is called while unpickling. """
        self.__dict__.update(state)

    def format_char(self,char):
        char = char + "a"

    def format_book(self):
        self.tempread = ""
        charlist = [f.read() for f in self.namelist] #list of char
        with Pool() as pool:
            txtlist = pool.map(self.format_char,charlist)
        self.tempread = "".join(txtlist)
        return self.tempread

这没关系,只要您不需要在子进程中访问namelist.

This would be ok as long as you don't need to access namelist in the child process.

这篇关于我可以在类的方法中使用multiprocessing.Pool吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆