在python中使用线程化/多重处理来同时进行多个计算 [英] Using threading/multiprocessing in python to do multiple calculations at the same time

查看:87
本文介绍了在python中使用线程化/多重处理来同时进行多个计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数字列表.我想对列表中的每个数字执行一些耗时的操作,并使用所有结果创建一个新列表.这是我所拥有的简化版本:

I have a list of numbers. I want to perform some time-consuming operation on each number in the list and make a new list with all the results. Here's a simplified version of what I have:

def calcNum(n):#some arbitrary, time-consuming calculation on a number
  m = n
  for i in range(5000000):
    m += i%25
    if m > n*n:
      m /= 2
  return m

nums = [12,25,76,38,8,2,5]
finList = []

for i in nums:
  return_val = calcNum(i)
  finList.append(return_val)

print(finList)

现在,我想利用CPU中的多个内核,并给每个内核分配一个处理数字的任务,并且由于数字计算"功能自始至终都是独立的这将非常简单,并且是多处理/线程的完美情况.

Now, I wanted to take advantage of the multiple cores in my CPU, and give each of them a task of processing one of the numbers, and since the "number calculation" function is self-contained from start to finish I figured this would be fairly simple to do and a perfect situation for multiprocessing/threading.

我的问题是,我应该使用哪一个(多处理或线程处理?),最简单的方法是什么?

My question is, which one should I use (multiprocessing or threading?), and what is the simplest way to do this?

我用在其他问题中发现的各种代码进行了测试,以实现这一目标,尽管运行良好,但似乎并没有进行任何实际的多线程/处理,并且所需时间与我的第一个测试一样长.

I did a test with various code I found in other questions to achieve this, and while it runs fine it doesn't seem to be doing any actual multithreading/processing and takes just as long as my first test:

from multiprocessing.pool import ThreadPool

def calcNum(n):#some arbitrary, time-consuming calculation on a number
  m = n
  for i in range(5000000):
    m += i%25
    if m > n*n:
      m /= 2
  return m

pool = ThreadPool(processes=3)

nums = [12,25,76,38,8,2,5]
finList = []

for i in nums:
  async_result = pool.apply_async(calcNum, (i,))
  return_val = async_result.get()
  finList.append(return_val)

print(finList)

推荐答案

multiprocessing.poolpool.map是您最好的朋友.它隐藏了所有其他复杂的队列,使您省去了很多麻烦.您需要做的就是设置池,为其分配最大进程数,将其指向该函数并且可迭代.请参阅下面的工作代码.

multiprocessing.pool and pool.map are your best friends here. It saves a lot of headache as it hides all the other complex queues and whatnot you need to make it work. All you need to do is set up the pool, assign it the max number of processes, point it to the function and iterable. See working code below.

由于join和用例pool.map可以正常工作,因此程序将等到所有进程都返回某些内容后再给出结果.

Because of the join and the usage cases pool.map was intended to work, the program will wait until ALL processes have returned something before giving you the result.

from multiprocessing.pool import Pool

def calcNum(n):#some arbitrary, time-consuming calculation on a number
  print "Calcs Started on ", n
  m = n
  for i in range(5000000):
    m += i%25
    if m > n*n:
      m /= 2
  return m

if __name__ == "__main__":
  p = Pool(processes=3)

  nums = [12,25,76,38,8,2,5]
  finList = []


  result = p.map(calcNum, nums)
  p.close()
  p.join()

  print result

那会给你这样的东西:

Calcs Started on  12
Calcs Started on  25
Calcs Started on  76
Calcs Started on  38
Calcs Started on  8
Calcs Started on  2
Calcs Started on  5
[72, 562, 5123, 1270, 43, 23, 23]

无论每个进程何时启动或何时完成,map都会等待每个进程完成,然后将它们全部以正确的顺序放回去(对应于可迭代的输入).

Regardless of when each process is started or when it completes, map waits for each to finish and then puts them all back in the correct order (corresponding to the input iterable).

正如@Guy所提到的,GIL在这里伤害了我们.您可以在上面的代码中将Pool更改为ThreadPool,并查看它如何影响计算时间.由于使用了相同的功能,因此GIL仅允许一个线程一次使用calcNum函数.因此它足够接近仍可以串行运行. Multirocessingprocesspool本质上会启动脚本的其他实例,从而可以解决GIL的问题.如果您在上述期间观察正在运行的进程,则将在池运行时看到"python.exe"的其他实例启动.在这种情况下,您总共会看到4.

As @Guy mentioned, the GIL hurts us here. You can change the Pool to ThreadPool in the code above and see how it affects the timing of the calculations. Since the same function is used, the GIL only allows one thread to use the calcNum function at a time. So it near enough still runs serially. Multirocessing with a process or pool essentially starts further instances of your script which gets around the issue of the GIL. If you watch your running processes during the above, you'll see extra instances of 'python.exe' start while the pool is running. In this case, you'll see a total of 4.

这篇关于在python中使用线程化/多重处理来同时进行多个计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆