python中大型数组的乘法 [英] multiplication of large arrays in python

查看:105
本文介绍了python中大型数组的乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很大的数组,也可以乘以大量的迭代.

我正在训练一个数组长约1500的模型,我将执行3次乘法约1000000次,这耗时将近一周.

我发现Dask试图将其与正常的numpy方法进行比较,但发现numpy的速度更快:

x = np.arange(2000)

start = time.time()
y = da.from_array(x, chunks=(100))

for i in range (0,100):
    p = y.dot(y)

#print(p)
print( time.time() - start)

print('------------------------------')

start = time.time()

p = 0

for i in range (0,100):
    p = np.dot(x,x)

print(time.time() - start)

0.08502793312072754

0.00015974044799804688

我使用dask是错误的还是那么快Numpy?

解决方案

.dot的性能很大程度上取决于 BLAS库.

如果您拥有像OpenBLAS或MKL这样的现代化实现,则NumPy已经使用所有内核全速运行.在这种情况下, dask.array 可能只会妨碍您的操作,如果不保证进一步的并行性,则会导致线程争用.

如果您通过Anaconda安装了NumPy,则您可能已经安装了OpenBLAS或MKL,因此,我对您所拥有的性能感到满意,并称之为一天".

但是,在您的实际示例中,您使用的块太小了(chunks=(100,)).繁琐的任务计划程序会为每个任务带来大约一毫秒的开销.您应该选择一个块大小,以便每个任务花费几百毫秒的时间来隐藏此开销.通常,良好的经验法则是针对大小超过兆字节的块.这就是造成您所看到的性能差异很大的原因.

I have big arrays to multiply in large number of iterations also.

I am training a model with array long around 1500 and I will perform 3 multiplications for about 1000000 times which takes a long time almost week.

I found Dask I tried to compare it with the normal numpy way but I found numpy faster:

x = np.arange(2000)

start = time.time()
y = da.from_array(x, chunks=(100))

for i in range (0,100):
    p = y.dot(y)

#print(p)
print( time.time() - start)

print('------------------------------')

start = time.time()

p = 0

for i in range (0,100):
    p = np.dot(x,x)

print(time.time() - start)

0.08502793312072754

0.00015974044799804688

Am I using dask wrong or it is numpy that fast ?

解决方案

Performance for .dot strongly depends on the BLAS library to which your NumPy implementation is linked.

If you have a modern implementation like OpenBLAS or MKL then NumPy is already running at full speed using all of your cores. In this case dask.array will likely only get in the way, trying to add further parallelism when none is warranted, causing thread contention.

If you have installed NumPy through Anaconda then you likely already have OpenBLAS or MKL, so I would just be happy with the performance that you have and call it a day.

However, in your actual example you're using chunks that are far too small (chunks=(100,)). The dask task scheduler incurs about a millisecond of overhead per task. You should choose a chunksize so that each task takes somewhere in the 100s of milliseconds in order to hide this overhead. Generally a good rule of thumb is to aim for chunks that are above a megabyte in size. This is what is causing the large difference in performance that you're seeing.

这篇关于python中大型数组的乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆