更有效的方式来计算numpy的距离? [英] more efficient way to calculate distance in numpy?

查看:70
本文介绍了更有效的方式来计算numpy的距离?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,如何以最快的速度计算numpy中的距离

i have a question on how to calculate distances in numpy as fast as it can,

def getR1(VVm,VVs,HHm,HHs):
    t0=time.time()
    R=VVs.flatten()[numpy.newaxis,:]-VVm.flatten()[:,numpy.newaxis]
    R*=R
    R1=HHs.flatten()[numpy.newaxis,:]-HHm.flatten()[:,numpy.newaxis]
    R1*=R1
    R+=R1
    del R1
    print "R1\t",time.time()-t0, R.shape, #11.7576191425 (108225, 10500) 
    print numpy.max(R) #4176.26290975
    # uses 17.5Gb ram
    return R


def getR2(VVm,VVs,HHm,HHs):
    t0=time.time()
    precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
    measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
    deltas = precomputed_flat[None,:,:] - measured_flat[:, None, :]
    #print time.time()-t0, deltas.shape # 5.861109972 (108225, 10500, 2)
    R = numpy.einsum('ijk,ijk->ij', deltas, deltas)
    print "R2\t",time.time()-t0,R.shape, #14.5291359425 (108225, 10500)
    print numpy.max(R) #4176.26290975
    # uses 26Gb ram
    return R


def getR3(VVm,VVs,HHm,HHs):
    from numpy.core.umath_tests import inner1d
    t0=time.time()
    precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
    measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
    deltas = precomputed_flat[None,:,:] - measured_flat[:, None, :]
    #print time.time()-t0, deltas.shape # 5.861109972 (108225, 10500, 2)
    R = inner1d(deltas, deltas)
    print "R3\t",time.time()-t0, R.shape, #12.6972110271 (108225, 10500)
    print numpy.max(R) #4176.26290975
    #Uses 26Gb
    return R


def getR4(VVm,VVs,HHm,HHs):
    from scipy.spatial.distance import cdist
    t0=time.time()
    precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
    measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
    R=spdist.cdist(precomputed_flat,measured_flat, 'sqeuclidean') #.T
    print "R4\t",time.time()-t0, R.shape, #17.7022118568 (108225, 10500)
    print numpy.max(R) #4176.26290975
    # uses 9 Gb ram
    return R

def getR5(VVm,VVs,HHm,HHs):
    from scipy.spatial.distance import cdist
    t0=time.time()
    precomputed_flat = numpy.column_stack((VVs.flatten(), HHs.flatten()))
    measured_flat = numpy.column_stack((VVm.flatten(), HHm.flatten()))
    R=spdist.cdist(precomputed_flat,measured_flat, 'euclidean') #.T
    print "R5\t",time.time()-t0, R.shape, #15.6070930958 (108225, 10500)
    print numpy.max(R) #64.6240118667
    # uses only 9 Gb ram
    return R

def getR6(VVm,VVs,HHm,HHs):
    from scipy.weave import blitz
    t0=time.time()
    R=VVs.flatten()[numpy.newaxis,:]-VVm.flatten()[:,numpy.newaxis]
    blitz("R=R*R") # R*=R
    R1=HHs.flatten()[numpy.newaxis,:]-HHm.flatten()[:,numpy.newaxis]
    blitz("R1=R1*R1") # R1*=R1
    blitz("R=R+R1") # R+=R1
    del R1
    print "R6\t",time.time()-t0, R.shape, #11.7576191425 (108225, 10500) 
    print numpy.max(R) #4176.26290975
    return R

在以下时间产生结果:

R1  11.7737319469 (108225, 10500) 4909.66881791
R2  15.1279799938 (108225, 10500) 4909.66881791
R3  12.7408981323 (108225, 10500) 4909.66881791
R4  17.3336868286 (10500, 108225) 4909.66881791
R5  15.7530870438 (10500, 108225) 70.0690289494
R6  11.670968771 (108225, 10500) 4909.66881791

尽管最后一个给出的是sqrt((VVm-VVs)^ 2 +(HHm-HHs)^ 2),而其他的给出的是(VVm-VVs)^ 2 +(HHm-HHs)^ 2,但这不是真的很重要,因为否则在我的代码中,我对每个i取R [i ,:]的最小值,而sqrt始终不会影响最小值,(如果我对距离感兴趣,我就取sqrt(value ),而不是对整个数组执行sqrt,因此实际上没有时序差异.

While the last one gives sqrt((VVm-VVs)^2+(HHm-HHs)^2), while the others give (VVm-VVs)^2+(HHm-HHs)^2, This is not really important, since otherwise further on in my code i take the minimum of R[i,:] for each i, and sqrt doesnt influence the minimum value anyways, (and if i am interested in the distance, i just take sqrt(value), instead of doing the sqrt over the entire array, so there is really no timing difference due to that.

问题仍然存在:第一个解决方案为什么是最好的(第二个和第三个解决方案较慢的原因是因为deltas = ...花费5.8秒,(这也是这两个方法花费26Gb的原因)),并且为什么千金刚石要比欧几里德慢?

The question remains: how come the first solution is the best, (the reason the second and third are slower is because deltas=... takes 5.8seconds, (which is also why those two methods take 26Gb)), And why is the sqeuclidean slower than the euclidean?

sqeuclidean应该只做(VVm-VVs)^ 2 +(HHm-HHs)^ 2,而我认为它做的事情有所不同.有人知道如何找到该方法的源代码(C或底部的任何内容)吗?我认为它确实sqrt((VVm-VVs)^ 2 +(HHm-HHs)^ 2)^ 2(我能想到为什么它会比(VVm-VVs)^ 2 +(HHm-HHs)慢的唯一原因^ 2-我知道这是一个愚蠢的原因,有人有一个更合乎逻辑的理由吗?)

sqeuclidean should just do (VVm-VVs)^2+(HHm-HHs)^2, while i think it does something different. Anyone know how to find the sourcecode (C or whatever is at the bottom) of that method? I think it does sqrt((VVm-VVs)^2+(HHm-HHs)^2)^2 (the only reason i can think why it would be slower than (VVm-VVs)^2+(HHm-HHs)^2 - I know its a stupid reason, anyone got a more logical one?)

由于我对C一无所知,我该如何用scipy.weave内联?并且该代码是否可以像您使用python一样正常编译?还是我需要为此安装特殊的东西?

Since i know nothing of C, how would i inline this with scipy.weave? and is that code compilable normally like you do with python? or do i need special stuff installed for that?

好的,我使用scipy.weave.blitz(R6方法)进行了尝试,并且速度稍快一些,但是我认为知道C比我更多的人仍然可以提高此速度吗?我只是采用了形式为a + = b或* =的行,并查看它们在C中的状态,然后将它们放入blitz语句中,但是我想我是否应该在其中使用带有flatten和newaxis的语句行同样,C也应该更快一些,但是我不知道该怎么做(知道C的人可能会解释吗?).现在,闪电战的东西和我的第一种方法之间的差异还不够大,无法真正由C vs numpy引起吗?

ok, i tried it with scipy.weave.blitz, (R6 method), and that is slightly faster, but i assume someone who knows more C than me can still improve this speed? I just took the lines which are of the form a+=b or *=, and looked up how they would be in C, and put them in the blitz statement, but i guess if i put lines with the statements with flatten and newaxis in C as well, that it should go faster too, but i dont know how i can do that (someone who knows C maybe explain?). Right now, the difference between the stuff with blitz and my first method are not big enough to really be caused by C vs numpy i guess?

我想其他类似deltas = ...的方法也可以快得多,当我将其放入C语言时?

I guess the other methods like with deltas=... can go much faster too, when i would put it in C ?

推荐答案

每当有乘法和和时,请尝试使用点积函数或np.einsum之一.由于您要预先分配数组,而不要为水平和垂直坐标使用不同的数组,因此将它们堆叠在一起:

Whenever you have multiplications and sums, try to use one of the dot product functions or np.einsum. Since you are preallocating your arrays, rather than having different arrays for horizontal and vertical coordinates, stack them both together:

precomputed_flat = np.column_stack((svf.flatten(), shf.flatten()))
measured_flat = np.column_stack((VVmeasured.flatten(), HHmeasured.flatten()))
deltas = precomputed_flat - measured_flat[:, None, :]

从这里开始,最简单的是:

From here, the simplest would be:

dist = np.einsum('ijk,ijk->ij', deltas, deltas)

您也可以尝试以下方法:

You could also try something like:

from numpy.core.umath_tests import inner1d
dist = inner1d(deltas, deltas)


当然也有SciPy的空间模块 cdist :


There is of course also SciPy's spatial module cdist:

from scipy.spatial.distance import cdist
dist = cdist(precomputed_flat, measured_flat, 'euclidean')


编辑 我无法在如此大的数据集上运行测试,但这些时间颇具启发性:


EDIT I cannot run tests on such a large dataset, but these timings are rather enlightening:

len_a, len_b = 10000, 1000

a = np.random.rand(2, len_a)
b =  np.random.rand(2, len_b)
c = np.random.rand(len_a, 2)
d = np.random.rand(len_b, 2)

In [3]: %timeit a[:, None, :] - b[..., None]
10 loops, best of 3: 76.7 ms per loop

In [4]: %timeit c[:, None, :] - d
1 loops, best of 3: 221 ms per loop

对于上述较小的数据集,通过在内存中以不同的方式排列数据,我可以使用scipy.spatial.distance.cdist稍微加快您的方法并使其与inner1d匹配:

For the above smaller dataset, I can get a slight speed up over your method with scipy.spatial.distance.cdist and match it with inner1d, by arranging data differently in memory:

precomputed_flat = np.vstack((svf.flatten(), shf.flatten()))
measured_flat = np.vstack((VVmeasured.flatten(), HHmeasured.flatten()))
deltas = precomputed_flat[:, None, :] - measured_flat

import scipy.spatial.distance as spdist
from numpy.core.umath_tests import inner1d

In [13]: %timeit r0 = a[0, None, :] - b[0, :, None]; r1 = a[1, None, :] - b[1, :, None]; r0 *= r0; r1 *= r1; r0 += r1
10 loops, best of 3: 146 ms per loop

In [14]: %timeit deltas = (a[:, None, :] - b[..., None]).T; inner1d(deltas, deltas)
10 loops, best of 3: 145 ms per loop

In [15]: %timeit spdist.cdist(a.T, b.T)
10 loops, best of 3: 124 ms per loop

In [16]: %timeit deltas = a[:, None, :] - b[..., None]; np.einsum('ijk,ijk->jk', deltas, deltas)
10 loops, best of 3: 163 ms per loop

这篇关于更有效的方式来计算numpy的距离?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆