为什么多进程在同一进程中运行的东西? [英] Why is multiprocessing running things in the same process?
问题描述
我从运行以下解决方案如何恢复传递给multiprocessing.Process的函数的返回值?:
import multiprocessing
from os import getpid
def worker(procnum):
print('I am number %d in process %d' % (procnum, getpid()))
return getpid()
if __name__ == '__main__':
pool = multiprocessing.Pool(processes = 3)
print(pool.map(worker, range(5)))
它应该输出如下:
I am number 0 in process 19139
I am number 1 in process 19138
I am number 2 in process 19140
I am number 3 in process 19139
I am number 4 in process 19140
[19139, 19138, 19140, 19139, 19140]
但我只能获得
[4212, 4212, 4212, 4212, 4212]
如果我给pool.map提供了一个范围为1,000,000的范围,使用超过10个进程,我最多看到两个不同的pids。
If I feed pool.map a range of 1,000,000 using more than 10 processes I see at most two different pids.
为什么是<
推荐答案
TL; DR :任务不是以任何方式分配的,也许您的任务太短,他们都在其他进程开始之前完成。
TL;DR: tasks are not specifically distributed in any way, perhaps your tasks are so short they are all completed before the other processes get started.
multiprocessing
的来源,似乎任务被简单地放在队列
中,工作进程从中读取 worker
从 Pool._inqueue
中读取。没有计算的分布,工人只是尽可能努力地工作。
From looking at the source of multiprocessing
, it appears that tasks are simply put in a Queue
, which the worker processes read from (function worker
reads from Pool._inqueue
). There's no calculated distribution going on, the workers just race to work as hard as possible.
最有可能的赌注,将是,因为任务是非常短,所以一个过程完成所有的过程之前,其他人有机会看或甚至开始。您可以通过向任务添加两秒钟睡眠
来轻松检查是否是这种情况。
The most likely bet then, would be that as the tasks are simply very short, so one process finishes all of them before the others have a chance to look or even get started. You can easily check if this is the case this by adding a two-second sleep
to the task.
将注意到在我的机器上,任务都很均匀地分布在进程上(同样对于#processes> #cores)。所以似乎有一些系统依赖,即使所有进程在工作排队之前应该有 .start()
ed。
I'll note that on my machine, the tasks all get spread over the processes pretty homogeneously (also for #processes > #cores). So there seems to be some system-dependence, even though all processes should have .start()
ed before work is queued.
这里是来自 worker
的修剪源代码,它显示每个进程只从队列中读取任务,所以以伪随机顺序:
Here's some trimmed source from worker
, which shows that the tasks are just read from the queue by each process, so in pseudo-random order:
def worker(inqueue, outqueue, ...):
...
get = inqueue.get
...
while maxtasks is None or (maxtasks and completed < maxtasks):
try:
task = get()
...
SimpleQueue
使用 Pipe
,从 SimpleQueue
构造函数:
self._reader, self._writer = Pipe(duplex=False)
$ b b
EDIT :可能关于进程开始太慢的部分是false,所以我删除它。在任何工作排队之前,所有进程都是 .start()
ed(可能 platform -dependent)。我找不到进程是否准备好了 .start()
返回。
EDIT: possibly the part about processes starting too slow is false, so I removed it. All processes are .start()
ed before any work is queued (which may be platform-dependent). I can't find whether the process is ready at the moment .start()
returns.
这篇关于为什么多进程在同一进程中运行的东西?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!