Python的多处理过程或池为我在做什么? [英] Python Multiprocessing Process or Pool for what I am doing?

查看:243
本文介绍了Python的多处理过程或池为我在做什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新的Python多处理和试图找出我是否应该使用池或过程调用两个功能异步。这两个函数我必须做出卷曲电话和解析信息到2个独立的名单。根据不同的互联网连接,各功能可以采取每个约4秒。我意识到,瓶颈是在ISP连接和多不加快步伐了,但它会是不错的他们俩揭开序幕异步。另外,这对我来说是进入python的多任务处理,因为我将更加以后使用它一个很好的学习经验。

I'm new to multiprocessing in Python and trying to figure out if I should use Pool or Process for calling two functions async. The two functions I have make curl calls and parse the information into a 2 separate lists. Depending on the internet connection, each function could take about 4 seconds each. I realize that the bottleneck is in the ISP connection and multiprocessing won't speed it up much, but it would be nice to have them both kick off async. Plus, this is a great learning experience for me to get into python's multi-processing because I will be using it more later.

我读<一个href=\"http://stackoverflow.com/questions/8533318/python-multiprocessing-pool-when-to-use-apply-apply-async-or-map\">Python multiprocessing.Pool:何时使用申请,apply_async或地图,它是有用的,但仍然有我自己的问题?

I have read Python multiprocessing.Pool: when to use apply, apply_async or map? and it was useful, but still had my own questions.

于是一种方式,我可以做到这一点是:

So one way I could do it is:

def foo():
    pass

def bar():
    pass

p1 = Process(target=foo, args=())
p2 = Process(target=bar, args=())

p1.start()
p2.start()
p1.join()
p2.join()

问题我对这个实现是:
1)由于直到调用过程完成加入块......这是否意味着P1进程完成之前P2过程拉开序幕?我始终明白了。加入()是相同的pool.apply()和pool.apply_sync()。获得(),其中父进程无法启动另一个进程(任务),直到当前运行完成。

Questions I have for this implementation is: 1) Since join blocks until calling process is completed...does this mean p1 process has to finish before p2 process is kicked off? I always understood the .join() be the same as pool.apply() and pool.apply_sync().get() where the parent process can not launch another process(task) until the current one running is completed.

另一种方法是这样的:

def foo():
    pass

def bar():
    pass
pool = Pool(processes=2)             
p1 = pool.apply_async(foo)
p1 = pool.apply_async(bar)

问题我有此实现将是:
1)我需要一个pool.close(),pool.join()?
2)请问pool.map()让他们都完成之前,我能得到的结果吗?如果是这样,他们还是跑了非同步?
3)如何将pool.apply_async()从做每道工序与pool.apply不同()
4)这将如何从进程previous实施有什么不同?

Questions I have for this implementation would be: 1) Do I need a pool.close(), pool.join()? 2) Would pool.map() make them all complete before I could get results? And if so, are they still ran asynch? 3) How would pool.apply_async() differ from doing each process with pool.apply() 4) How would this differ from the previous implementation with Process?

推荐答案

在上市完成同样的事情,但略有不同。这两个方案

The two scenarios you listed accomplish the same thing but in slightly different ways.

第一种方案启动两个独立的过程(叫他们P1和P2),并开始P1运行和P2运行,然后等待,直到这两个进程已经完成了各自的任务。

The first scenario starts two separate processes (call them P1 and P2) and starts P1 running foo and P2 running bar, and then waits until both processes have finished their respective tasks.

第二个方案将启动两个进程(叫他们Q1和Q2),并首次启动上Q1或Q2,然后启动上Q1或Q2。然后,code等待,直到函数调用已经返回。

The second scenario starts two processes (call them Q1 and Q2) and first starts foo on either Q1 or Q2, and then starts bar on either Q1 or Q2. Then the code waits until both function calls have returned.

所以,最终的结果实际上是一样的,但在第一种情况下你保证运行在不同的进程。

So the net result is actually the same, but in the first case you're guaranteed to run foo and bar on different processes.

至于具体的问题,你有关于并发,在处理。加入()确实不块,直到该进程已经完成,但因为你叫。开始()在加入前P1和P2(在第一种情景),那么这两个进程将异步运行。跨preTER会,但是,等到P1试图等待P2结束前完成。

As for the specific questions you had about concurrency, the .join() method on a Process does indeed block until the process has finished, but because you called .start() on both P1 and P2 (in your first scenario) before joining, then both processes will run asynchronously. The interpreter will, however, wait until P1 finishes before attempting to wait for P2 to finish.

有关您对池方案的问题,您应该使用技术上 pool.close(),但它种取决于你可能需要为它以后的(如果它只是超出范围,那么你不必一定关闭)。 pool.map()是完全不同的一种动物,因为它分配了一堆参数相同的功能(异步),整个池进程,然后等待,直到所有的函数调用返回结果的列表之前已经完成。

For your questions about the pool scenario, you should technically use pool.close() but it kind of depends on what you might need it for afterwards (if it just goes out of scope then you don't need to close it necessarily). pool.map() is a completely different kind of animal, because it distributes a bunch of arguments to the same function (asynchronously), across the pool processes, and then waits until all function calls have completed before returning the list of results.

这篇关于Python的多处理过程或池为我在做什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆