为什么dask中的点积比numpy中的点积慢 [英] why is dot product in dask slower than in numpy

查看:135
本文介绍了为什么dask中的点积比numpy中的点积慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

dask中的点积似乎比numpy中的慢得多:

a dot product in dask seems to run much slower than in numpy:

import numpy as np
x_np = np.random.normal(10, 0.1, size=(1000,100))
y_np = x_np.transpose()
%timeit x_np.dot(y_np)
# 100 loops, best of 3: 7.17 ms per loop

import dask.array as da
x_dask = da.random.normal(10, 0.1, size=(1000,100), chunks=(5,5))
y_dask = x_dask.transpose()
%timeit x_dask.dot(y_dask)
# 1 loops, best of 3: 6.56 s per loop

有人知道这可能是什么原因吗?我在这里想念什么吗?

Does anybody know what might be the reason for that? Is there anything I'm missing here?

推荐答案

调整块大小

@isternberg的回答是正确的,您应该调整块大小.遵循以下规则是对块大小的一个很好的选择

Adjust chunk sizes

The answer by @isternberg is correct that you should adjust chunk sizes. A good choice of chunk size follows the following rules

  1. 一个块应该足够小以舒适地容纳在内存中.
  2. 一个块必须足够大,以使该块上的计算所花费的时间远远超过执行任务所花费的1ms开销(因此100ms-1s是一个不错的选择).
  3. 大块应该与您要执行的计算保持一致.例如,如果您计划频繁地沿特定尺寸切片,则将块对齐以使触摸更少的块会更有效.

我通常会拍摄1-100兆字节大的块.小于此值的任何操作均无济于事,并且通常会创建足够的任务,因此调度开销成为我们最大的瓶颈.

I generally shoot for chunks that are 1-100 megabytes large. Anything smaller than that isn't helpful and usually creates enough tasks that scheduling overhead becomes our largest bottleneck.

如果阵列的大小仅为(1000, 100),则没有理由使用 dask.array .相反,请使用numpy,并且,如果您真的很想使用多个内核,请确保您的numpy库已与有效的BLAS实现(例如MLK或OpenBLAS)链接在一起.

If your array is only of size (1000, 100) then there is no reason to use dask.array. Instead, use numpy and, if you really care about using mulitple cores, make sure that your numpy library is linked against an efficient BLAS implementation like MLK or OpenBLAS.

如果您使用多线程BLAS实现,则实际上可能要关闭dask线程.这两个系统会互相干扰,并降低性能.如果是这种情况,则可以使用以下命令关闭dask线程.

If you use a multi-threaded BLAS implementation you might actually want to turn dask threading off. The two systems will clobber each other and reduce performance. If this is the case then you can turn off dask threading with the following command.

dask.set_options(get=dask.async.get_sync)

要真正计时dask.array计算的执行时间,您必须在计算结束时添加一个.compute()调用,否则,您只是在计时创建任务图所花费的时间,而不是执行它.

To actually time the execution of a dask.array computation you'll have to add a .compute() call to the end of the computation, otherwise you're just timing how long it takes to create the task graph, not to execute it.

In [1]: import dask.array as da

In [2]: x = da.random.normal(10, 0.1, size=(2000, 100000), chunks=(1000, 1000))  # larger example

In [3]: %time z = x.dot(x.T)  # create task graph
CPU times: user 12 ms, sys: 3.57 ms, total: 15.6 ms
Wall time: 15.3 ms

In [4]: %time _ = z.compute()  # actually do work
CPU times: user 2min 41s, sys: 841 ms, total: 2min 42s
Wall time: 21 s

这篇关于为什么dask中的点积比numpy中的点积慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆