在函数中执行循环多处理的最快方法? [英] Fastest way to perform Multiprocessing of a loop in a function?

查看:44
本文介绍了在函数中执行循环多处理的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

1. 我有一个函数 var.我想知道利用系统拥有的所有处理器、内核、线程和 RAM 内存,通过多处理/并行处理快速运行此函数中的循环的最佳方法.

1. I have a function var. I want to know the best possible way to run the loop within this function quickly by multiprocessing/parallel processing by utilizing all the processors, cores, threads, and RAM memory the system has.

import numpy
from pysheds.grid import Grid

xs = 82.1206, 72.4542, 65.0431, 83.8056, 35.6744
ys = 25.2111, 17.9458, 13.8844, 10.0833, 24.8306

a = r'/home/test/image1.tif'
b = r'/home/test/image2.tif'

def var(interest):
    
    variable_avg = []
    for (x,y) in zip(xs,ys):
        grid = Grid.from_raster(interest, data_name='map')

        grid.catchment(data='map', x=x, y=y, out_name='catch')

        variable = grid.view('catch', nodata=np.nan)
        variable = numpy.array(variable)
        variablemean = (variable).mean()
        variable_avg.append(variablemean)
    return(variable_avg)

2. 如果我可以同时运行函数 var 并在其中为函数的给定多个参数并行循环,那就太好了.例如:var(a)var(b) 同时.因为它会消耗更少的时间,所以单独并行化循环.

2. It would be great if I can run both function var and loop in it parallelly for the given multiple parameters of the function. ex:var(a)and var(b) at the same time. Since it will consume much less time then just parallelizing the loop alone.

忽略 2,如果它没有意义.

推荐答案

TLDR:您可以使用多处理库并行运行 var 函数.但是,正如所写的那样,由于其开销,您可能不会对 var 进行足够多的调用以进行多处理以获得性能优势.如果您只需要运行这两个调用,串行运行可能是最快的.但是,如果您需要进行大量调用,多处理可以帮助您.

TLDR: You can use the multiprocessing library to run your var function in parallel. However, as written you likely don't make enough calls to var for multiprocessing to have a performance benefit because of its overhead. If all you need to do is run those two calls, running in serial is likely the fastest you'll get. However, if you need to make a lot of calls, multiprocessing can help you out.

我们需要使用进程池来并行运行它,线程在这里不起作用,因为 Python 的全局解释器锁会阻止我们实现真正的并行.进程池的缺点是进程是重量级的启动.在仅运行两次对 var 的调用的示例中,创建池的时间超过了运行 var 本身所花费的时间.

We'll need to use a process pool to run this in parallel, threads won't work here because Python's global interpreter lock will prevent us from true parallelism. The drawback of process pools is that processes are heavyweight to spin up. In the example of just running two calls to var the time to create the pool overwhelms the time spent running var itself.

为了说明这一点,让我们使用进程池并使用 asyncio 并行运行对 var 的调用,并将其与仅按顺序运行进行比较.注意运行这个例子我使用了来自 Pysheds 库的图像 https://github.com/mdbartos/pysheds/tree/master/data - 如果您的图片大得多,以下内容可能不成立.

To illiustrate this, let's use a process pool and use asyncio to run calls to var in parallel and compare it to just running things sequentially. Note to run this example I used an image from the Pysheds library https://github.com/mdbartos/pysheds/tree/master/data - if your image is much larger the below may not hold true.

import functools
import time
from concurrent.futures.process import ProcessPoolExecutor
import asyncio

a = 'diem.tif'
xs = 10, 20, 30, 40, 50
ys = 10, 20, 30, 40, 50

async def main():
    loop = asyncio.get_event_loop()
    pool_start = time.time()
    with ProcessPoolExecutor() as pool:
        task_one = loop.run_in_executor(pool, functools.partial(var, a))
        task_two = loop.run_in_executor(pool, functools.partial(var, a))
        results = await asyncio.gather(task_one, task_two)
        pool_end = time.time()
        print(f'Process pool took {pool_end-pool_start}')

    serial_start = time.time()

    result_one = var(a)
    result_two = var(a)

    serial_end = time.time()
    print(f'Running in serial took {serial_end - serial_start}')


if __name__ == "__main__":
    asyncio.run(main())

在我的机器(2.4 GHz 8 核英特尔酷睿 i9)上运行上面的代码,我得到以下输出:

Running the above on my machine (a 2.4 GHz 8-Core Intel Core i9) I get the following output:

Process pool took 1.7581260204315186
Running in serial took 0.32335805892944336

在这个例子中,进程池慢了五倍多!这是由于创建和管理多个进程的开销.也就是说,如果您需要多次调用 var,那么进程池可能更有意义.让我们调整它以运行 var 100 次并比较结果:

In this example, a process pool is over five times slower! This is due to the overhead of creating and managing multiple processes. That said, if you need to call var more than just a few times, a process pool may make more sense. Let's adapt this to run var 100 times and compare the results:

async def main():
    loop = asyncio.get_event_loop()
    pool_start = time.time()
    tasks = []
    with ProcessPoolExecutor() as pool:
        for _ in range(100):
            tasks.append(loop.run_in_executor(pool, functools.partial(var, a)))
        results = await asyncio.gather(*tasks)
        pool_end = time.time()
        print(f'Process pool took {pool_end-pool_start}')

    serial_start = time.time()

    for _ in range(100):
        result = var(a)

    serial_end = time.time()
    print(f'Running in serial took {serial_end - serial_start}')

运行 100 次,我得到以下输出:

Running 100 times, I get the following output:

Process pool took 3.442288875579834
Running in serial took 13.769982099533081

在这种情况下,在进程池中运行大约快 4 倍.您可能还希望尝试同时运行循环的每个迭代.为此,您可以创建一个函数,一次处理一个 x,y 坐标,然后在进程池中运行您要检查的每个点:

In this case, running in a process pool is about 4x faster. You may also wish to try running each iteration of your loop concurrently. You can do this by creating a function that processes one x,y coordinate at a time and then run each point you want to examine in a process pool:

def process_poi(interest, x, y):
    grid = Grid.from_raster(interest, data_name='map')

    grid.catchment(data='map', x=x, y=y, out_name='catch')

    variable = grid.view('catch', nodata=np.nan)
    variable = np.array(variable)
    return variable.mean()

async def var_loop_async(interest, pool, loop):
    tasks = []
    for (x,y) in zip(xs,ys):
        function_call = functools.partial(process_poi, interest, x, y)
        tasks.append(loop.run_in_executor(pool, function_call))

    return await asyncio.gather(*tasks)

async def main():
    loop = asyncio.get_event_loop()
    pool_start = time.time()
    tasks = []
    with ProcessPoolExecutor() as pool:
        for _ in range(100):
            tasks.append(var_loop_async(a, pool, loop))
        results = await asyncio.gather(*tasks)
        pool_end = time.time()
        print(f'Process pool took {pool_end-pool_start}')

    serial_start = time.time() 

在这种情况下,我得到 进程池占用了 3.2950568199157715 - 所以实际上并不比我们的第一个版本快,每次调用 var 一个进程.这可能是因为此时的限制因素是我们的 CPU 上有多少可用内核,将我们的工作分成更小的增量并不会增加太多价值.

In this case I get Process pool took 3.2950568199157715 - so not really any faster than our first version with one process per each call of var. This is likely because the limiting factor at this point is how many cores we have available on our CPU, splitting our work into smaller increments does not add much value.

也就是说,如果您希望跨两个图像检查 1000 个 x 和 y 坐标,则最后一种方法可能会带来性能提升.

That said, if you have 1000 x and y coordinates you wish to examine across two images, this last approach may yield a performance gain.

这篇关于在函数中执行循环多处理的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆