计算执行期间在 multiprocessing.Pool 中执行的任务总数 [英] Counting total number of tasks executed in a multiprocessing.Pool during execution

查看:31
本文介绍了计算执行期间在 multiprocessing.Pool 中执行的任务总数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很想说明目前的谈话,我们只是.我正在耕种,想知道当前的进展.因此,如果我将 100 个作业发送到 10 处理器,我如何显示当前返回的作业数量是多少.我可以获得 ID,但是我如何计算从我的地图函数返回的已完成作业的数量.

I'd love to give an indication of the current talk in total that we are only. I'm farming work out and would like to know current progress. So if I sent 100 jobs to 10 processors, how can I show what the current number of jobs that have returned is. I can get the id's but but how do I count up the number of completed returned jobs from my map function.

我调用我的函数如下:

op_list = pool.map(PPMDR_star, list(varg))

在我的函数中,我可以打印当前名称

And in my function I can print the current name

current = multiprocessing.current_process()
print 'Running: ', current.name, current._identity

推荐答案

如果你使用 pool.map_async 你可以从 MapResult 返回的实例.例如:

If you use pool.map_async you can pull this information out of the MapResult instance that gets returned. For example:

import multiprocessing
import time

def worker(i):
    time.sleep(i)
    return i


if __name__ == "__main__":
    pool = multiprocessing.Pool()
    result = pool.map_async(worker, range(15))
    while not result.ready():
        print("num left: {}".format(result._number_left))
        time.sleep(1)
    real_result = result.get()
    pool.close()
    pool.join()

输出:

num left: 15
num left: 14
num left: 13
num left: 12
num left: 11
num left: 10
num left: 9
num left: 9
num left: 8
num left: 8
num left: 7
num left: 7
num left: 6
num left: 6
num left: 6
num left: 5
num left: 5
num left: 5
num left: 4
num left: 4
num left: 4
num left: 3
num left: 3
num left: 3
num left: 2
num left: 2
num left: 2
num left: 2
num left: 1
num left: 1
num left: 1
num left: 1

multiprocessing 在内部将您传递给 map 的迭代分解为块,并将每个块传递给子进程.所以,_number_left 属性真正跟踪剩余的 chunks 的数量,而不是迭代中的单个元素.如果在使用大型可迭代对象时看到奇怪的数字,请记住这一点.它使用分块来提高 IPC 性能,但如果看到完整结果的准确统计比增加的性能更重要,您可以使用 chunksize=1 关键字参数来map_async 使 _num_left 更准确.(chunksize 通常只会对非常大的可迭代对象产生显着的性能差异.亲自尝试一下,看看它是否对您的用例真的很重要.

multiprocessing internally breaks the iterable you pass to map into chunks, and passes each chunk to the children processes. So, the _number_left attribute really keeps track of the number of chunks remaining, not the individual elements in the iterable. Keep that in mind if you see odd looking numbers when you use large iterables. It uses chunking to improve IPC performance, but if seeing an accurate tally of completed results is more important to you than the added performance, you can use the chunksize=1 keyword argumment to map_async to make _num_left more accurate. (The chunksize usually only makes a noticable performance difference for very large iterables. Try it for yourself to see if it really matters with your usecase).

正如你在评论中提到的,因为 pool.map 是阻塞的,你不能真正得到这个,除非你启动一个后台线程来进行轮询,而主线程在map 调用,但我不确定与上述方法相比这样做有什么好处.

As you mentioned in the comments, because pool.map is blocking, you can't really get this unless you were to start a background thread that did the polling while the main thread blocked in the map call, but I'm not sure there's any benefit to doing that over the above approach.

要记住的另一件事是您正在使用 MapResult 的内部属性,因此这可能会在 Python 的未来版本中中断.

The other thing to keep in mind is that you're using an internal attribute of MapResult, so it's possible that this could break in future versions of Python.

这篇关于计算执行期间在 multiprocessing.Pool 中执行的任务总数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆