使用多处理池的并行处理循环 [英] Parallel processing loop using multiprocessing Pool

查看:104
本文介绍了使用多处理池的并行处理循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想并行处理一个大的for循环,从中我读到的最好的方法是使用Python随附的多处理库.

I want to process a large for loop in parallel, and from what I have read the best way to do this is to use the multiprocessing library that comes standard with Python.

我有大约40,000个对象的列表,我想在一个单独的类中并行处理它们.在单独的类中执行此操作的原因主要是因为我在 此处 中读到的内容.

I have a list of around 40,000 objects, and I want to process them in parallel in a separate class. The reason for doing this in a separate class is mainly because of what I read here.

在一个类中,我将所有对象都放在列表中,并通过multiprocessing.Pool和Pool.map函数,我希望通过使每个对象都经过另一个类并返回值来对其进行并行计算.

In one class I have all the objects in a list and via the multiprocessing.Pool and Pool.map functions I want to carry out parallel computations for each object by making it go through another class and return a value.

# ... some class that generates the list_objects
pool = multiprocessing.Pool(4)
results = pool.map(Parallel, self.list_objects)

然后有一个我要处理pool.map函数传递的每个对象的类:

And then I have a class which I want to process each object passed by the pool.map function:

class Parallel(object):
    def __init__(self, args):
        self.some_variable          = args[0]
        self.some_other_variable    = args[1]
        self.yet_another_variable   = args[2]
        self.result                 = None

    def __call__(self):
        self.result                 = self.calculate(self.some_variable)

我使用call方法的原因是由于我之前链接过的帖子,但是我不确定我是否正确使用了它,因为它似乎没有效果.我没有生成self.result值.

The reason I have a call method is due to the post I linked before, yet I'm not sure I'm using it correctly as it seems to have no effect. I'm not getting the self.result value to be generated.

有什么建议吗? 谢谢!

Any suggestions? Thanks!

推荐答案

使用普通功能,而不是类.仅在这样做具有明显优势时才使用类.

Use a plain function, not a class, when possible. Use a class only when there is a clear advantage to doing so.

如果您确实需要使用一个类,则在进行设置后,传递一个Parallel实例:

If you really need to use a class, then given your setup, pass an instance of Parallel:

results = pool.map(Parallel(args), self.list_objects)

由于实例具有__call__方法,因此实例本身可以调用,就像函数一样.

Since the instance has a __call__ method, the instance itself is callable, like a function.

顺便说一句,__call__需要接受一个附加参数:

By the way, the __call__ needs to accept an additional argument:

def __call__(self, val):

因为pool.map本质上将要并行调用

since pool.map is essentially going to call in parallel

p = Parallel(args)
result = []
for val in self.list_objects:
    result.append(p(val))

这篇关于使用多处理池的并行处理循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆