Python多重处理-跟踪pool.map操作的过程 [英] Python multiprocessing - tracking the process of pool.map operation

查看:76
本文介绍了Python多重处理-跟踪pool.map操作的过程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个执行一些模拟的功能, 返回字符串格式的数组.

I have a function which performs some simulation and returns an array in string format.

我想运行模拟(功能) 变化的输入参数值,超过10000个可能的输入值, 并将结果写入单个文件.

I want to run the simulation (the function) for varying input parameter values, over 10000 possible input values, and write the results to a single file.

我正在使用多重处理,特别是pool.map函数 并行运行仿真.

I am using multiprocessing, specifically, pool.map function to run the simulations in parallel.

由于运行仿真功能的全过程超过10000次 需要很长时间,我真的很想跟踪整个操作的过程.

Since the whole process of running the simulation function over 10000 times takes a very long time, I really would like to track the process of the entire operation.

我认为下面我当前代码中的问题是,pool.map运行该函数10000次,而在这些操作过程中没有任何进程跟踪.一旦并行处理完成了10000个模拟的运行(可能是数小时到数天),那么我将继续跟踪10000个模拟结果被保存到文件中的时间.因此,这实际上并不是在跟踪pool.map操作的处理.

I think the problem in my current code below is that, pool.map runs the function 10000 times, without any process tracking during those operations. Once the parallel processing finishes running 10000 simulations (could be hours to days.), then I keep tracking when 10000 simulation results are being saved to a file..So this is not really tracking the processing of pool.map operation.

我的代码是否有简单的修补程序,可以进行进程跟踪?

Is there an easy fix to my code that will allow process tracking?

def simFunction(input):
    # Does some simulation and outputs simResult
    return str(simResult)

# Parallel processing

inputs = np.arange(0,10000,1)

if __name__ == "__main__":
    numCores = multiprocessing.cpu_count()
    pool = multiprocessing.Pool(processes = numCores)
    t = pool.map(simFunction, inputs) 
    with open('results.txt','w') as out:
        print("Starting to simulate " + str(len(inputs)) + " input values...")
        counter = 0
        for i in t:
            out.write(i + '\n')
            counter = counter + 1
            if counter%100==0:
                print(str(counter) + " of " + str(len(inputs)) + " input values simulated")
    print('Finished!!!!')

推荐答案

如果使用迭代的map函数,跟踪进度非常容易.

If you use an iterated map function, it's pretty easy to keep track of progress.

>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> def simFunction(x,y):
...   import time
...   time.sleep(2)
...   return x**2 + y
... 
>>> x,y = range(100),range(-100,100,2)
>>> res = Pool().imap(simFunction, x,y)
>>> with open('results.txt', 'w') as out:
...   for i in x:
...     out.write("%s\n" % res.next())
...     if i%10 is 0:
...       print "%s of %s simulated" % (i, len(x))
... 
0 of 100 simulated
10 of 100 simulated
20 of 100 simulated
30 of 100 simulated
40 of 100 simulated
50 of 100 simulated
60 of 100 simulated
70 of 100 simulated
80 of 100 simulated
90 of 100 simulated

或者,您可以使用异步map.在这里,我将做些不同的事情,只是将其混合在一起.

Or, you can use an asynchronous map. Here I'll do things a little differently, just to mix it up.

>>> import time
>>> res = Pool().amap(simFunction, x,y)
>>> while not res.ready():
...   print "waiting..."
...   time.sleep(5)
... 
waiting...
waiting...
waiting...
waiting...
>>> res.get()
[-100, -97, -92, -85, -76, -65, -52, -37, -20, -1, 20, 43, 68, 95, 124, 155, 188, 223, 260, 299, 340, 383, 428, 475, 524, 575, 628, 683, 740, 799, 860, 923, 988, 1055, 1124, 1195, 1268, 1343, 1420, 1499, 1580, 1663, 1748, 1835, 1924, 2015, 2108, 2203, 2300, 2399, 2500, 2603, 2708, 2815, 2924, 3035, 3148, 3263, 3380, 3499, 3620, 3743, 3868, 3995, 4124, 4255, 4388, 4523, 4660, 4799, 4940, 5083, 5228, 5375, 5524, 5675, 5828, 5983, 6140, 6299, 6460, 6623, 6788, 6955, 7124, 7295, 7468, 7643, 7820, 7999, 8180, 8363, 8548, 8735, 8924, 9115, 9308, 9503, 9700, 9899]

请注意,我使用的是pathos.multiprocessing而不是multiprocessing.它只是multiprocessing的一个分支,使您能够使用多个输入来执行map功能,具有更好的序列化,并允许您在任何地方(不仅在__main__中)执行map调用.您也可以使用multiprocessing进行上述操作,但是代码会稍有不同.

Note that I'm using pathos.multiprocessing instead of multiprocessing. It's just a fork of multiprocessing that enables you to do map functions with multiple inputs, has much better serialization, and allows you to execute map calls anywhere (not just in __main__). You could use multiprocessing to do the above as well, however the code would be very slightly different.

迭代的或异步的map将使您能够编写任何想要进行更好的过程跟踪的代码.例如,将唯一的"id"传递给每个作业,并观察返回的作业,或让每个作业返回其进程ID.跟踪进度和过程的方法有很多……但是以上内容应该为您提供一个开始.

Either an iterated or asynchronous map will enable you to write whatever code you want to do better process tracking. For example, pass a unique "id" to each job, and watch which come back, or have each job return it's process id. There are lots of ways to track progress and processes… but the above should give you a start.

您可以在此处获取pathos: https://github.com/uqfoundation

这篇关于Python多重处理-跟踪pool.map操作的过程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆