更有效的方式来计算numpy的距离? [英] more efficient way to calculate distance in numpy?
问题描述
我有一个问题,如何以最快的速度计算numpy中的距离
i have a question on how to calculate distances in numpy as fast as it can,
def getR1(VVm,VVs,HHm,HHs):
t0=time.time()
R=VVs.flatten()[numpy.newaxis,:]-VVm.flatten()[:,numpy.newaxis]
R*=R
R1=HHs.flatten()[numpy.newaxis,:]-HHm.flatten()[:,numpy.newaxis]
R1*=R1
R+=R1
del R1
print "R1\t",time.time()-t0, R.shape, #11.7576191425 (108225, 10500)
print numpy.max(R) #4176.26290975
# uses 17.5Gb ram
return R
def getR2(VVm,VVs,HHm,HHs):
t0=time.time()
precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
deltas = precomputed_flat[None,:,:] - measured_flat[:, None, :]
#print time.time()-t0, deltas.shape # 5.861109972 (108225, 10500, 2)
R = numpy.einsum('ijk,ijk->ij', deltas, deltas)
print "R2\t",time.time()-t0,R.shape, #14.5291359425 (108225, 10500)
print numpy.max(R) #4176.26290975
# uses 26Gb ram
return R
def getR3(VVm,VVs,HHm,HHs):
from numpy.core.umath_tests import inner1d
t0=time.time()
precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
deltas = precomputed_flat[None,:,:] - measured_flat[:, None, :]
#print time.time()-t0, deltas.shape # 5.861109972 (108225, 10500, 2)
R = inner1d(deltas, deltas)
print "R3\t",time.time()-t0, R.shape, #12.6972110271 (108225, 10500)
print numpy.max(R) #4176.26290975
#Uses 26Gb
return R
def getR4(VVm,VVs,HHm,HHs):
from scipy.spatial.distance import cdist
t0=time.time()
precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
R=spdist.cdist(precomputed_flat,measured_flat, 'sqeuclidean') #.T
print "R4\t",time.time()-t0, R.shape, #17.7022118568 (108225, 10500)
print numpy.max(R) #4176.26290975
# uses 9 Gb ram
return R
def getR5(VVm,VVs,HHm,HHs):
from scipy.spatial.distance import cdist
t0=time.time()
precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
R=spdist.cdist(precomputed_flat,measured_flat, 'euclidean') #.T
print "R5\t",time.time()-t0, R.shape, #15.6070930958 (108225, 10500)
print numpy.max(R) #64.6240118667
# uses only 9 Gb ram
return R
def getR6(VVm,VVs,HHm,HHs):
from scipy.weave import blitz
t0=time.time()
R=VVs.flatten()[numpy.newaxis,:]-VVm.flatten()[:,numpy.newaxis]
blitz("R=R*R") # R*=R
R1=HHs.flatten()[numpy.newaxis,:]-HHm.flatten()[:,numpy.newaxis]
blitz("R1=R1*R1") # R1*=R1
blitz("R=R+R1") # R+=R1
del R1
print "R6\t",time.time()-t0, R.shape, #11.7576191425 (108225, 10500)
print numpy.max(R) #4176.26290975
return R
在以下时间产生结果:
R1 11.7737319469 (108225, 10500) 4909.66881791
R2 15.1279799938 (108225, 10500) 4909.66881791
R3 12.7408981323 (108225, 10500) 4909.66881791
R4 17.3336868286 (10500, 108225) 4909.66881791
R5 15.7530870438 (10500, 108225) 70.0690289494
R6 11.670968771 (108225, 10500) 4909.66881791
尽管最后一个给出的是sqrt((VVm-VVs)^ 2 +(HHm-HHs)^ 2),而其他的给出的是(VVm-VVs)^ 2 +(HHm-HHs)^ 2,但这不是真的很重要,因为否则在我的代码中,我对每个i取R [i ,:]的最小值,而sqrt始终不会影响最小值,(如果我对距离感兴趣,我就取sqrt(value ),而不是对整个数组执行sqrt,因此实际上没有时序差异.
While the last one gives sqrt((VVm-VVs)^2+(HHm-HHs)^2), while the others give (VVm-VVs)^2+(HHm-HHs)^2, This is not really important, since otherwise further on in my code i take the minimum of R[i,:] for each i, and sqrt doesnt influence the minimum value anyways, (and if i am interested in the distance, i just take sqrt(value), instead of doing the sqrt over the entire array, so there is really no timing difference due to that.
问题仍然存在:第一个解决方案为什么是最好的(第二个和第三个解决方案较慢的原因是因为deltas = ...花费5.8秒,(这也是这两个方法花费26Gb的原因)),并且为什么千金刚石要比欧几里德慢?
The question remains: how come the first solution is the best, (the reason the second and third are slower is because deltas=... takes 5.8seconds, (which is also why those two methods take 26Gb)), And why is the sqeuclidean slower than the euclidean?
sqeuclidean应该只做(VVm-VVs)^ 2 +(HHm-HHs)^ 2,而我认为它做的事情有所不同.有人知道如何找到该方法的源代码(C或底部的任何内容)吗?我认为它确实sqrt((VVm-VVs)^ 2 +(HHm-HHs)^ 2)^ 2(我能想到为什么它会比(VVm-VVs)^ 2 +(HHm-HHs)慢的唯一原因^ 2-我知道这是一个愚蠢的原因,有人有一个更合乎逻辑的理由吗?)
sqeuclidean should just do (VVm-VVs)^2+(HHm-HHs)^2, while i think it does something different. Anyone know how to find the sourcecode (C or whatever is at the bottom) of that method? I think it does sqrt((VVm-VVs)^2+(HHm-HHs)^2)^2 (the only reason i can think why it would be slower than (VVm-VVs)^2+(HHm-HHs)^2 - I know its a stupid reason, anyone got a more logical one?)
由于我对C一无所知,我该如何用scipy.weave内联?并且该代码是否可以像您使用python一样正常编译?还是我需要为此安装特殊的东西?
Since i know nothing of C, how would i inline this with scipy.weave? and is that code compilable normally like you do with python? or do i need special stuff installed for that?
好的,我使用scipy.weave.blitz(R6方法)进行了尝试,并且速度稍快一些,但是我认为知道C比我更多的人仍然可以提高此速度吗?我只是采用了形式为a + = b或* =的行,并查看它们在C中的状态,然后将它们放入blitz语句中,但是我想我是否应该在其中使用带有flatten和newaxis的语句行同样,C也应该更快一些,但是我不知道该怎么做(知道C的人可能会解释吗?).现在,闪电战的东西和我的第一种方法之间的差异还不够大,无法真正由C vs numpy引起吗?
ok, i tried it with scipy.weave.blitz, (R6 method), and that is slightly faster, but i assume someone who knows more C than me can still improve this speed? I just took the lines which are of the form a+=b or *=, and looked up how they would be in C, and put them in the blitz statement, but i guess if i put lines with the statements with flatten and newaxis in C as well, that it should go faster too, but i dont know how i can do that (someone who knows C maybe explain?). Right now, the difference between the stuff with blitz and my first method are not big enough to really be caused by C vs numpy i guess?
我想其他类似deltas = ...的方法也可以快得多,当我将其放入C语言时?
I guess the other methods like with deltas=... can go much faster too, when i would put it in C ?
推荐答案
每当有乘法和和时,请尝试使用点积函数或np.einsum
之一.由于您要预先分配数组,而不要为水平和垂直坐标使用不同的数组,因此将它们堆叠在一起:
Whenever you have multiplications and sums, try to use one of the dot product functions or np.einsum
. Since you are preallocating your arrays, rather than having different arrays for horizontal and vertical coordinates, stack them both together:
precomputed_flat = np.column_stack((svf.flatten(), shf.flatten()))
measured_flat = np.column_stack((VVmeasured.flatten(), HHmeasured.flatten()))
deltas = precomputed_flat - measured_flat[:, None, :]
从这里开始,最简单的是:
From here, the simplest would be:
dist = np.einsum('ijk,ijk->ij', deltas, deltas)
您也可以尝试以下方法:
You could also try something like:
from numpy.core.umath_tests import inner1d
dist = inner1d(deltas, deltas)
当然也有SciPy的空间模块 cdist
:
There is of course also SciPy's spatial module cdist
:
from scipy.spatial.distance import cdist
dist = cdist(precomputed_flat, measured_flat, 'euclidean')
编辑 我无法在如此大的数据集上运行测试,但这些时间颇具启发性:
EDIT I cannot run tests on such a large dataset, but these timings are rather enlightening:
len_a, len_b = 10000, 1000
a = np.random.rand(2, len_a)
b = np.random.rand(2, len_b)
c = np.random.rand(len_a, 2)
d = np.random.rand(len_b, 2)
In [3]: %timeit a[:, None, :] - b[..., None]
10 loops, best of 3: 76.7 ms per loop
In [4]: %timeit c[:, None, :] - d
1 loops, best of 3: 221 ms per loop
对于上述较小的数据集,通过在内存中以不同的方式排列数据,我可以使用scipy.spatial.distance.cdist
稍微加快您的方法并使其与inner1d
匹配:
For the above smaller dataset, I can get a slight speed up over your method with scipy.spatial.distance.cdist
and match it with inner1d
, by arranging data differently in memory:
precomputed_flat = np.vstack((svf.flatten(), shf.flatten()))
measured_flat = np.vstack((VVmeasured.flatten(), HHmeasured.flatten()))
deltas = precomputed_flat[:, None, :] - measured_flat
import scipy.spatial.distance as spdist
from numpy.core.umath_tests import inner1d
In [13]: %timeit r0 = a[0, None, :] - b[0, :, None]; r1 = a[1, None, :] - b[1, :, None]; r0 *= r0; r1 *= r1; r0 += r1
10 loops, best of 3: 146 ms per loop
In [14]: %timeit deltas = (a[:, None, :] - b[..., None]).T; inner1d(deltas, deltas)
10 loops, best of 3: 145 ms per loop
In [15]: %timeit spdist.cdist(a.T, b.T)
10 loops, best of 3: 124 ms per loop
In [16]: %timeit deltas = a[:, None, :] - b[..., None]; np.einsum('ijk,ijk->jk', deltas, deltas)
10 loops, best of 3: 163 ms per loop
这篇关于更有效的方式来计算numpy的距离?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!