Python的map_async如何使结果井井有条? [英] How is Python's map_async keeping results in order?

查看:407
本文介绍了Python的map_async如何使结果井井有条?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试探索py3.3的Python多处理库,我注意到map_async函数的奇怪结果,我一直无法解释.我一直期望从回调存储的结果是乱序".也就是说,如果我将许多任务提供给工作进程,则某些任务应该先于其他任务完成,而不必按照它们输入或存在于输入列表中的顺序进行.但是,我得到了与输入任务完全一致的有序结果集.即使在有意通过减慢某些过程来破坏"某些过程之后,情况也是如此(大概可以让其他过程在此之前完成).

I'm trying to explore Python's multiprocessing library for py3.3 and I noticed an odd result in the map_async function that I've been unable to explain. I've been expecting the results stored from the callback to be "out of order". That is, if I feed a number of tasks to the worker processes, some should complete before others, not necessarily in the same order they're fed in or exist in the input list. However, I'm getting an ordered set of results that corresponds perfectly with the inputted tasks. This is the case even after purposely trying to "sabotage" some processes by slowing them down (which, presumably would allow others to complete before it).

我在calculate函数中有一条打印语句,表明它们的计算顺序不正确,但结果仍然是正确的.尽管我不确定我是否可以相信打印语句作为判断实际情况是计算异常的重要指标.

I have a print statement in the calculate function that shows they're being calculated out of order, yet results are still in order. Though I'm not sure I can trust a print statement as a great indicator that things are actually calculating out of order.

测试过程(一般示例): 建立一个对象列表,每个对象都包含一个整数. 将该对象列表作为参数,连同函数计算"一起提交给map_async,该函数用平方值更新对象的numValue属性. 然后,计算"功能返回具有更新后值的对象.

The test process (a general example): Build a list of objects, each of which holds an integer. Submit that list of objects to map_async as arguments, along with the function "calculate" that update's the object's numValue attribute with a squared value. Then the "calculate" function returns the object with its updated value.

一些代码:

import time
import multiprocessing
import random

class NumberHolder():
    def __init__(self,numValue):
        self.numValue = numValue    #Only one attribute

def calculate(obj):
    if random.random() >= 0.5:
        startTime = time.time()
        timeWaster = [random.random() for x in range(5000000)] #Waste time.
        endTime = time.time()           #Establish end time
        print("%d object got stuck in here for %f seconds"%(obj.numValue,endTime-startTime))

#Main Process
if __name__ == '__main__':
    numbersToSquare = [x for x in range(0,100)]     #I'm 
    taskList = []

    for eachNumber in numbersToSquare:
        taskList.append(NumberHolder(eachNumber))   #Create a list of objects whose numValue is equal to the numbers we want to square

    results = [] #Where the results will be stored
    pool = multiprocessing.Pool(processes=(multiprocessing.cpu_count() - 1)) #Don't use all my processing power.
    r = pool.map_async(calculate, taskList, callback=results.append)  #Using fxn "calculate", feed taskList, and values stored in "results" list
    r.wait()                # Wait on the results from the map_async

results = results[0]    #All of the entries only exist in the first offset
for eachObject in results:      #Loop through them and show them
    print(eachObject.numValue)          #If they calc'd "out of order", I'd expect append out of order

我找到了一个写得很好的答复,似乎支持map_async可能具有乱序"结果的想法: http://docs.python.org/3.3/library /multiprocessing.html ).对于map_async,它对这种方法说:"...如果指定了回调,那么它应该是一个可以接受单个参数的可调用对象.结果就绪后,将对其应用回调(除非调用失败).否则,回调应立即完成.处理结果的线程将被阻塞"

I found this well written response, which seems to support the idea that map_async can have results that are "out of order": multiprocessing.Pool: When to use apply, apply_async or map? . I also looked up the documentation here ( http://docs.python.org/3.3/library/multiprocessing.html ). For map_async it says for this method "...If callback is specified then it should be a callable which accepts a single argument. When the result becomes ready callback is applied to it (unless the call failed). callback should complete immediately since otherwise the thread which handles the results will get blocked"

我误解了它应该如何工作?任何帮助都将不胜感激.

Am I misunderstanding how this is supposed to work? Any help is much appreciated.

推荐答案

这是预期的行为.文档说:

That's the expected behavior. The docs say:

map()方法的一种变体,它返回一个结果对象.

A variant of the map() method which returns a result object.

结果对象"只是保存计算结果的容器类.调用r.wait()时,您要等到所有全部汇总并整理好.即使它乱序处理任务,结果仍将保持原始顺序.

The "result object" is just a container class that holds the calculated results. When you call r.wait(), you wait until all of the results are aggregated and put in order. Even though it processes tasks out of order, the results will still be in the original order.

如果希望在计算结果时产生结果,请使用imap_unordered.

If you want the results to be yielded as they are calculated, use imap_unordered.

这篇关于Python的map_async如何使结果井井有条?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆