计算执行期间在 multiprocessing.Pool 中执行的任务总数 [英] Counting total number of tasks executed in a multiprocessing.Pool during execution
问题描述
我很想说明目前的谈话,我们只是.我正在耕种,想知道当前的进展.因此,如果我将 100
个作业发送到 10
处理器,我如何显示当前返回的作业数量是多少.我可以获得 ID,但是我如何计算从我的地图函数返回的已完成作业的数量.
I'd love to give an indication of the current talk in total that we are only. I'm farming work out and would like to know current progress. So if I sent 100
jobs to 10
processors, how can I show what the current number of jobs that have returned is. I can get the id's but but how do I count up the number of completed returned jobs from my map function.
我调用我的函数如下:
op_list = pool.map(PPMDR_star, list(varg))
在我的函数中,我可以打印当前名称
And in my function I can print the current name
current = multiprocessing.current_process()
print 'Running: ', current.name, current._identity
推荐答案
如果你使用 pool.map_async
你可以从 MapResult
返回的实例.例如:
If you use pool.map_async
you can pull this information out of the MapResult
instance that gets returned. For example:
import multiprocessing
import time
def worker(i):
time.sleep(i)
return i
if __name__ == "__main__":
pool = multiprocessing.Pool()
result = pool.map_async(worker, range(15))
while not result.ready():
print("num left: {}".format(result._number_left))
time.sleep(1)
real_result = result.get()
pool.close()
pool.join()
输出:
num left: 15
num left: 14
num left: 13
num left: 12
num left: 11
num left: 10
num left: 9
num left: 9
num left: 8
num left: 8
num left: 7
num left: 7
num left: 6
num left: 6
num left: 6
num left: 5
num left: 5
num left: 5
num left: 4
num left: 4
num left: 4
num left: 3
num left: 3
num left: 3
num left: 2
num left: 2
num left: 2
num left: 2
num left: 1
num left: 1
num left: 1
num left: 1
multiprocessing
在内部将您传递给 map
的迭代分解为块,并将每个块传递给子进程.所以,_number_left
属性真正跟踪剩余的 chunks 的数量,而不是迭代中的单个元素.如果在使用大型可迭代对象时看到奇怪的数字,请记住这一点.它使用分块来提高 IPC 性能,但如果看到完整结果的准确统计比增加的性能更重要,您可以使用 chunksize=1
关键字参数来map_async
使 _num_left
更准确.(chunksize
通常只会对非常大的可迭代对象产生显着的性能差异.亲自尝试一下,看看它是否对您的用例真的很重要.
multiprocessing
internally breaks the iterable you pass to map
into chunks, and passes each chunk to the children processes. So, the _number_left
attribute really keeps track of the number of chunks remaining, not the individual elements in the iterable. Keep that in mind if you see odd looking numbers when you use large iterables. It uses chunking to improve IPC performance, but if seeing an accurate tally of completed results is more important to you than the added performance, you can use the chunksize=1
keyword argumment to map_async
to make _num_left
more accurate. (The chunksize
usually only makes a noticable performance difference for very large iterables. Try it for yourself to see if it really matters with your usecase).
正如你在评论中提到的,因为 pool.map
是阻塞的,你不能真正得到这个,除非你启动一个后台线程来进行轮询,而主线程在map
调用,但我不确定与上述方法相比这样做有什么好处.
As you mentioned in the comments, because pool.map
is blocking, you can't really get this unless you were to start a background thread that did the polling while the main thread blocked in the map
call, but I'm not sure there's any benefit to doing that over the above approach.
要记住的另一件事是您正在使用 MapResult
的内部属性,因此这可能会在 Python 的未来版本中中断.
The other thing to keep in mind is that you're using an internal attribute of MapResult
, so it's possible that this could break in future versions of Python.
这篇关于计算执行期间在 multiprocessing.Pool 中执行的任务总数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!