在python中使用线程化/多重处理来同时进行多个计算 [英] Using threading/multiprocessing in python to do multiple calculations at the same time
问题描述
我有一个数字列表.我想对列表中的每个数字执行一些耗时的操作,并使用所有结果创建一个新列表.这是我所拥有的简化版本:
I have a list of numbers. I want to perform some time-consuming operation on each number in the list and make a new list with all the results. Here's a simplified version of what I have:
def calcNum(n):#some arbitrary, time-consuming calculation on a number
m = n
for i in range(5000000):
m += i%25
if m > n*n:
m /= 2
return m
nums = [12,25,76,38,8,2,5]
finList = []
for i in nums:
return_val = calcNum(i)
finList.append(return_val)
print(finList)
现在,我想利用CPU中的多个内核,并给每个内核分配一个处理数字的任务,并且由于数字计算"功能自始至终都是独立的这将非常简单,并且是多处理/线程的完美情况.
Now, I wanted to take advantage of the multiple cores in my CPU, and give each of them a task of processing one of the numbers, and since the "number calculation" function is self-contained from start to finish I figured this would be fairly simple to do and a perfect situation for multiprocessing/threading.
我的问题是,我应该使用哪一个(多处理或线程处理?),最简单的方法是什么?
My question is, which one should I use (multiprocessing or threading?), and what is the simplest way to do this?
我用在其他问题中发现的各种代码进行了测试,以实现这一目标,尽管运行良好,但似乎并没有进行任何实际的多线程/处理,并且所需时间与我的第一个测试一样长.
I did a test with various code I found in other questions to achieve this, and while it runs fine it doesn't seem to be doing any actual multithreading/processing and takes just as long as my first test:
from multiprocessing.pool import ThreadPool
def calcNum(n):#some arbitrary, time-consuming calculation on a number
m = n
for i in range(5000000):
m += i%25
if m > n*n:
m /= 2
return m
pool = ThreadPool(processes=3)
nums = [12,25,76,38,8,2,5]
finList = []
for i in nums:
async_result = pool.apply_async(calcNum, (i,))
return_val = async_result.get()
finList.append(return_val)
print(finList)
推荐答案
multiprocessing.pool
和pool.map
是您最好的朋友.它隐藏了所有其他复杂的队列,使您省去了很多麻烦.您需要做的就是设置池,为其分配最大进程数,将其指向该函数并且可迭代.请参阅下面的工作代码.
multiprocessing.pool
and pool.map
are your best friends here. It saves a lot of headache as it hides all the other complex queues and whatnot you need to make it work. All you need to do is set up the pool, assign it the max number of processes, point it to the function and iterable. See working code below.
由于join
和用例pool.map
可以正常工作,因此程序将等到所有进程都返回某些内容后再给出结果.
Because of the join
and the usage cases pool.map
was intended to work, the program will wait until ALL processes have returned something before giving you the result.
from multiprocessing.pool import Pool
def calcNum(n):#some arbitrary, time-consuming calculation on a number
print "Calcs Started on ", n
m = n
for i in range(5000000):
m += i%25
if m > n*n:
m /= 2
return m
if __name__ == "__main__":
p = Pool(processes=3)
nums = [12,25,76,38,8,2,5]
finList = []
result = p.map(calcNum, nums)
p.close()
p.join()
print result
那会给你这样的东西:
Calcs Started on 12
Calcs Started on 25
Calcs Started on 76
Calcs Started on 38
Calcs Started on 8
Calcs Started on 2
Calcs Started on 5
[72, 562, 5123, 1270, 43, 23, 23]
无论每个进程何时启动或何时完成,map都会等待每个进程完成,然后将它们全部以正确的顺序放回去(对应于可迭代的输入).
Regardless of when each process is started or when it completes, map waits for each to finish and then puts them all back in the correct order (corresponding to the input iterable).
正如@Guy所提到的,GIL在这里伤害了我们.您可以在上面的代码中将Pool
更改为ThreadPool
,并查看它如何影响计算时间.由于使用了相同的功能,因此GIL仅允许一个线程一次使用calcNum
函数.因此它足够接近仍可以串行运行.
Multirocessing
与process
或pool
本质上会启动脚本的其他实例,从而可以解决GIL的问题.如果您在上述期间观察正在运行的进程,则将在池运行时看到"python.exe"的其他实例启动.在这种情况下,您总共会看到4.
As @Guy mentioned, the GIL hurts us here. You can change the Pool
to ThreadPool
in the code above and see how it affects the timing of the calculations. Since the same function is used, the GIL only allows one thread to use the calcNum
function at a time. So it near enough still runs serially.
Multirocessing
with a process
or pool
essentially starts further instances of your script which gets around the issue of the GIL. If you watch your running processes during the above, you'll see extra instances of 'python.exe' start while the pool is running. In this case, you'll see a total of 4.
这篇关于在python中使用线程化/多重处理来同时进行多个计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!