想知道为什么 scipy.spatial.distance.sqeuclidean 比 numpy.sum((y1-y2)**2) 慢两倍 [英] Wondering why scipy.spatial.distance.sqeuclidean is twice slower than numpy.sum((y1-y2)**2)

查看:40
本文介绍了想知道为什么 scipy.spatial.distance.sqeuclidean 比 numpy.sum((y1-y2)**2) 慢两倍的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的代码

将 numpy 导入为 np导入时间从 scipy.spatial 导入距离y1=np.array([0,0,0,0,1,0,0,0,0,0])y2=np.array([0. , 0.1, 0. , 0. , 0.7, 0.2, 0. , 0. , 0. , 0. ])start_time = time.time()对于我在范围内(1000000):distance.sqeuclidean(y1,y2)打印(--- %s 秒 ---" % (time.time() - start_time))

---15.212640523910522 秒---

start_time = time.time()对于我在范围内(1000000):np.sum((y1-y2)**2)打印(--- %s 秒 ---" % (time.time() - start_time))

---8.381187438964844---秒

我认为 Scipy 是经过优化的,所以它应该更快.

任何评论将不胜感激.

解决方案

这里有一个更全面的比较(归功于@Divakar 的 benchit 包):

def m1(y1,y2):返回距离.sqeuclidean(y1,y2)def m2(y1,y2):返回 np.sum((y1-y2)**2)in_ = {n:[np.random.rand(n), np.random.rand(n)] for n in [10,100,1000,10000,20000]}

scipy 对于更大的数组变得更高效.对于较小的数组,调用该函数的开销很可能超过其收益.根据

Here is my code

import numpy as np
import time
from scipy.spatial import distance

y1=np.array([0,0,0,0,1,0,0,0,0,0])
y2=np.array([0. , 0.1, 0. , 0. , 0.7, 0.2, 0. , 0. , 0. , 0. ])

start_time = time.time()
for i in range(1000000):
    distance.sqeuclidean(y1,y2)
print("--- %s seconds ---" % (time.time() - start_time))

---15.212640523910522 seconds---

start_time = time.time()
for i in range(1000000):
    np.sum((y1-y2)**2)
print("--- %s seconds ---" % (time.time() - start_time))

---8.381187438964844--- seconds

I supposed that the Scipy is kind of optimized so it should be faster.

Any comments will be appreciated.

解决方案

Here is a more comprehensive comparison (credit to @Divakar's benchit package):

def m1(y1,y2):
  return distance.sqeuclidean(y1,y2)

def m2(y1,y2):
  return np.sum((y1-y2)**2)

in_ = {n:[np.random.rand(n), np.random.rand(n)] for n in [10,100,1000,10000,20000]}

scipy gets more efficient for larger arrays. For smaller arrays, the overhead of calling the function most likely outweighs its benefit. According to source, scipy calculates np.dot(y1-y2,y1-y2).

And if you want an even faster solution, use np.dot directly without the overhead of extra lines and function calling:

def m3(y1,y2):
  y_d = y1-y2
  return np.dot(y_d,y_d)

这篇关于想知道为什么 scipy.spatial.distance.sqeuclidean 比 numpy.sum((y1-y2)**2) 慢两倍的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆