并行dask for循环比常规循环慢? [英] parallel dask for loop slower than regular loop?

查看:201
本文介绍了并行dask for循环比常规循环慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我尝试用dask并行化for循环,它的执行速度将比常规版本慢.基本上,我只是按照dask教程中的介绍性示例进行操作,但是由于某种原因,它最终还是失败了.我在做什么错了?

If I try to parallelize a for loop with dask, it ends up executing slower than the regular version. Basically, I just follow the introductory example from the dask tutorial, but for some reason it's failing on my end. What am I doing wrong?

In [1]: import numpy as np
   ...: from dask import delayed, compute
   ...: import dask.multiprocessing

In [2]: a10e4 = np.random.rand(10000, 11).astype(np.float16)
   ...: b10e4 = np.random.rand(10000, 11).astype(np.float16)

In [3]: def subtract(a, b):
   ...:     return a - b

In [4]: %%timeit
   ...: results = [subtract(a10e4, b10e4[index]) for index in range(len(b10e4))]
1 loop, best of 3: 10.6 s per loop

In [5]: %%timeit
   ...: values = [delayed(subtract)(a10e4, b10e4[index]) for index in range(len(b10e4)) ]
   ...: resultsDask = compute(*values, get=dask.multiprocessing.get)
1 loop, best of 3: 14.4 s per loop

推荐答案

两个问题:

  1. Dask会为每个任务带来大约一毫秒的开销.您将要确保计算花费的时间大大超过此时间.
  2. 使用多处理调度程序时,数据会在进程之间进行序列化,这可能会非常昂贵.请参见 http://dask.pydata.org/en/latest/setup.html

这篇关于并行dask for循环比常规循环慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆