重复并行运行一个函数 [英] Repeatedly run a function in parallel

查看:66
本文介绍了重复并行运行一个函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何并行重复运行一个函数?

How do you run a function repeatedly in parallel?

例如,我有一个不带参数且具有随机元素的函数.我想多次运行它,下面使用 for 循环说明了这一点.我如何并行完成相同的操作?

For example, I have a function that takes no parameters and has a stochastic element. I want to run it multiple times, which is illustrated below using a for loop. How do I accomplish the same in parallel please?

import numpy as np

def f():
    x = np.random.uniform()
    return x*x    

np.random.seed(1)    
a = []
for i in range(10):
    a.append(f())

这是parallel-python-just-run-function-的副本n 次,然而,答案不太合适,因为它将不同的输入传递给函数,如何并行化一个简单的 Python 循环? 还给出了将不同参数传递给函数而不是重复相同调用的示例.

This is a duplicate of parallel-python-just-run-function-n-times, however, the answer doesn't quite fit as it passes different inputs into the function, and How do I parallelize a simple Python loop? also gives examples of passing different parameters into the function rather than repeating the same call.

我使用的是 Windows 10 并使用 Jupyter

I am on Windows 10 and using Jupyter

关于我的实际用途:

每次调用是否产生大量输出?
循环的每次迭代产生一个数字.

Does it produce a large volume of output per call?
Each iteration of the loop produces one number.

你需要保留输出吗?每次调用大约需要多长时间?
是的,我需要保留数字,每次迭代大约需要 30 分钟.

Do you need to keep the output? How long does each invocation take roughly?
Yes, I need to retain the numbers and it takes ~30 minutes per iteration.

?总共需要运行多少次?
至少 100 个.

?How many times do you need to run it in total?
At least 100.

您是要跨多台机器并行还是多核?
目前仅跨多个内核.

Do you want to parallelize across multiple machines or just multiple cores?
Currently just across multiple cores.

推荐答案

如果您不想将任何输入传递给函数,只需使用 Throwaway 变量 _ 作为函数的参数,然后将其并行化,如下面的代码所示.

If you don't want to pass any input to your function, just use a Throwaway variable _ as argument to your function and parallelise it as shown in the below code.

import numpy as np
from multiprocessing.pool import Pool

def f(_):
    x = np.random.uniform()
    return x*x

if __name__ == "__main__":
    processes = 5   # Specify number of processes here 
    p = Pool(processes)
    p.map(f, range(10))

更新:要回答您更新的问题,如果您的任务不是太重并且只是 I/O 绑定,那么我建议您使用 ThreadPool(多线程)而不是 Pool(多处理)

Update: To answer your updated question, if your tasks aren't too heavyweight and are just I/O bound, then I recommend you use ThreadPool (multithreading) instead of Pool (multiprocessing)

创建线程池的代码:

from multiprocessing.pool import ThreadPool

threads = 5
t = ThreadPool(threads)
t.map(f, range(10))

这篇关于重复并行运行一个函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆