为类的不同实例并行化python代码 [英] Parallelizing a python code for different instances of a class

查看:76
本文介绍了为类的不同实例并行化python代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题与并行化python代码有关,我想知道我们如何为类的不同实例运行函数以减少运行时间.

My question is related to parallelizing a python code and I want to know how we can run a function for different instances of a class to decrease the runtime.

我所拥有的:我有一个类 A 的多个实例(存储在名为 instances 的列表中).此类具有功能 add .现在,我们有多个独立的任务,每个类 A 的实例一个,其中所有这些任务的输入都是一件事(在我的示例中为数字 n ).每个实例都需要将函数 add 应用于 n 并返回一个数字.我们希望将所有实例的返回数字存储在列表中(在我的示例中为列表 results ).

What I have: I have multiple instances of a class A (stored in a list called instances). This class has a function add. Now, we have multiple independent tasks, one for each instance of class A where the input to all these tasks is one thing (number n in my example). Each instance needs to apply function add to n and return a number. We want to store the returned numbers of all instances in a list (list results in my example).

我想要的东西:如您所见,在此示例中,可以并行执行任务,因为无需等待另一个任务就可以完成.我们如何并行化下面的简单代码?由于不同实例之间没有任何共享,所以我想我们甚至可以使用多线程,对吗?还是唯一的方法就是使用多重处理?

What I want: As you can see, in this example, the tasks can be parallelized as there is no need for one to wait for the other one to gets done. How can we parallelize the simple code below? Since nothing is shared between the different instances, I guess we can even use multithreading, right? Or the only way is to use multiprocessing?

class A(object):
    def __init__(self, q):
        self.p = q

    def add(self, num):
        return self.p + num


instances = []
for i in xrange(5):
    instances.append(A(i))
n = 20
results = []
for inst in instances:
    results.append(inst.add(n))

print(results)

输出:[20、21、22、23、24]

Output: [20, 21, 22, 23, 24]

推荐答案

您的玩具代码似乎遵循的模式将建议使用线程池/进程池将包装函数映射到列表.但是,要为每个实例应用的实例数量和基本算术运算表明,并行执行此操作的开销将超过任何潜在的好处.

The pattern that your toy code seems to follow would suggest to map a wrapper function to the list using a thread pool / process pool. The number of instances and the basic arithmetic operation that you want to apply for each instance however suggests that the overhead for parallelizing this would outweigh any potential benefit.

执行此操作是否有意义,取决于实例数量和运行每个成员函数所需的时间.因此,在尝试并行化此代码之前,请确保至少对代码进行一些基本配置.找出要并行执行的任务是CPU约束还是IO约束.

Whether it makes sense to do this, depends on the number of instances and the time it takes to run each of those member functions. So make sure to do at least some basic profiling of your code before you try to parallelize this. Find out whether the tasks you attempt to parallelize is CPU-bound or IO-bound.

下面是一个应演示基本模式的示例:

Here's an example that should demonstrate the basic pattern:

# use multiprocessing.Pool for a processes-based worker pool
# use multiprocessing.dummy.Pool for a thread-based worker pool
from multiprocessing.dummy import Pool
# make up list of instances
l = [list() for i in range(5)]
# function that calls the method on each instance
def foo(x):
    x.append(20)
    return x
# actually call functions and retrieve list of results
p = Pool(3)
results = p.map(foo, l)
print(results)

很显然,您需要填补空白以使其适应您的真实代码.

Obviously you need to fill the blanks to adapt this to your real code.

进一步阅读:

  • https://docs.python.org/3/library/multiprocessing.html
  • https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.dummy
  • How to use threading in Python?
  • https://wiki.python.org/moin/GlobalInterpreterLock
  • What is a global interpreter lock (GIL)?
  • What do the terms "CPU bound" and "I/O bound" mean?

还可以看看未来:

  • https://pymotw.com/3/concurrent.futures/index.html#module-concurrent.futures
  • https://docs.python.org/3/library/concurrent.futures.html

如果您真的想与此并行,还可以考虑将计算移植到GPU(然后可能需要远离Python).

If you really want to have this parallel, also consider to port your calculations to a GPU (you might need to move away from Python then).

这篇关于为类的不同实例并行化python代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆