Python：有效的解决方法，用于多处理一个类中的数据成员的函数 [英] Python: Efficient workaround for multiprocessing a function that is a data member of a class, from within that class

查看：397 发布时间：2016/11/23 11:48:58 python class multiprocessing

本文介绍了Python：有效的解决方法，用于多处理一个类中的数据成员的函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道各种讨论限制处理一个类的数据成员的函数（由于Pickling问题）。

I'm aware of various discussions of limitations of the multiprocessing module when dealing with functions that are data members of a class (due to Pickling problems).

但是有另一个模块，或多任务处理中的任何类型的工作，允许特定类似于下面的东西（具体而不强制函数的定义被并行应用存在于类之外）？

But is there another module, or any sort of work-around in multiprocessing, that allows something specifically like the following (specifically without forcing the definition of the function to be applied in parallel to exist outside of the class)?

class MyClass():

    def __init__(self):
        self.my_args = [1,2,3,4]
        self.output  = {}

    def my_single_function(self, arg):
        return arg**2

    def my_parallelized_function(self):
        # Use map or map_async to map my_single_function onto the
        # list of self.my_args, and append the return values into
        # self.output, using each arg in my_args as the key.

        # The result should make self.output become
        # {1:1, 2:4, 3:9, 4:16}


foo = MyClass()
foo.my_parallelized_function()
print foo.output

注意：我可以通过在类外移动 my_single_function ，并传递类似 foo.my_args 到映射或 map_async 命令。但是这推动了函数在 MyClass 实例之外的并行执行。

Note: I can easily do this by moving my_single_function outside of the class, and passing something like foo.my_args to the map or map_async commands. But this pushes the parallelized execution of the function outside of instances of MyClass.

对于我的应用程序数据查询，它检索，联接和清理数据的每月横截面，然后将它们附加到这种横截面的长时间序列中），在类中具有此功能非常重要。 >因为我的程序的不同用户将用不同的时间间隔，不同的时间增量，要收集的不同数据子集等实例化该类的不同实例，等等，这应该都与该实例相关联。

For my application (parallelizing a large data query that retrieves, joins, and cleans monthly cross-sections of data, and then appends them into a long time-series of such cross-sections), it is very important to have this functionality inside the class since different users of my program will instantiate different instances of the class with different time intervals, different time increments, different sub-sets of data to gather, and so on, that should all be associated with that instance.

因此，我希望并行化的工作也由实例完成，因为它拥有与并行化查询相关的所有数据，并且它只是愚蠢的尝试写一些绑定到一些参数和生活在类外面的hacky wrapper函数（特别是因为这样的函数将是非常规的，它需要类中的所有类型的细节。）

Thus, I want the work of parallelizing to also be done by the instance, since it owns all the data relevant to the parallelized query, and it would just be silly to try write some hacky wrapper function that binds to some arguments and lives outside of the class (Especially since such a function would be non-general. It would need all kinds of specifics from inside the class.)

推荐答案

Steven Bethard 已经发布了一个方法，允许方法被pickle / unpickled。您可以像这样使用它：

Steven Bethard has posted a way to allow methods to be pickled/unpickled. You could use it like this:

import multiprocessing as mp import copy_reg import types def _pickle_method(method): # Author: Steven Bethard # http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods func_name = method.im_func.__name__ obj = method.im_self cls = method.im_class cls_name = '' if func_name.startswith('__') and not func_name.endswith('__'): cls_name = cls.__name__.lstrip('_') if cls_name: func_name = '_' + cls_name + func_name return _unpickle_method, (func_name, obj, cls) def _unpickle_method(func_name, obj, cls): # Author: Steven Bethard # http://bytes.com/topic/python/answers/552476-why-cant-you-pickle-instancemethods for cls in cls.mro(): try: func = cls.__dict__[func_name] except KeyError: pass else: break return func.__get__(obj, cls) # This call to copy_reg.pickle allows you to pass methods as the first arg to # mp.Pool methods. If you comment out this line, `pool.map(self.foo, ...)` results in # PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup # __builtin__.instancemethod failed copy_reg.pickle(types.MethodType, _pickle_method, _unpickle_method) class MyClass(object): def __init__(self): self.my_args = [1,2,3,4] self.output = {} def my_single_function(self, arg): return arg**2 def my_parallelized_function(self): # Use map or map_async to map my_single_function onto the # list of self.my_args, and append the return values into # self.output, using each arg in my_args as the key. # The result should make self.output become # {1:1, 2:4, 3:9, 4:16} self.output = dict(zip(self.my_args, pool.map(self.my_single_function, self.my_args)))

然后

pool = mp.Pool() foo = MyClass() foo.my_parallelized_function()

产生

print foo.output # {1: 1, 2: 4, 3: 9, 4: 16}

这篇关于Python：有效的解决方法，用于多处理一个类中的数据成员的函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python：有效的解决方法，用于多处理一个类中的数据成员的函数 [英] Python: Efficient workaround for multiprocessing a function that is a data member of a class, from within that class

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python：有效的解决方法，用于多处理一个类中的数据成员的函数 [英] Python: Efficient workaround for multiprocessing a function that is a data member of a class, from within that class

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭