计算python中每个点之间的距离的最快方法 [英] Fastest way to compute distance beetween each points in python

查看:1000
本文介绍了计算python中每个点之间的距离的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的项目中,我需要计算存储在数组中的每个点之间的欧几里得距离. 入口数组是一个二维的numpy数组,具有3列,分别是坐标(x,y,z),每行定义一个新点.

In my project I need to compute euclidian distance beetween each points stored in an array. The entry array is a 2D numpy array with 3 columns which are the coordinates(x,y,z) and each rows define a new point.

我通常在测试用例中使用5000-6000分.

I'm usualy working with 5000 - 6000 points in my test cases.

我的第一个算法使用Cython,而我的第二个numpy.我发现我的numpy算法比cython更快.

My first algorithm use Cython and my second numpy. I find that my numpy algorithm is faster than cython.

6000点:

numpy 1.76 s/cython 4.36 s

numpy 1.76 s / cython 4.36 s

这是我的cython代码:

Here's my cython code:

cimport cython
from libc.math cimport sqrt
@cython.boundscheck(False)
@cython.wraparound(False)
cdef void calcul1(double[::1] M,double[::1] R):

  cdef int i=0
  cdef int max = M.shape[0]
  cdef int x,y
  cdef int start = 1

  for x in range(0,max,3):
     for y in range(start,max,3):

        R[i]= sqrt((M[y] - M[x])**2 + (M[y+1] - M[x+1])**2 + (M[y+2] - M[x+2])**2)
        i+=1  

     start += 1

M是初始条目数组的内存视图,但在调用函数calcul1()之前由numpy表示flatten(),R是用于存储所有结果的一维输出数组的内存视图.

M is a memory view of the initial entry array but flatten() by numpy before the call of the function calcul1(), R is a memory view of a 1D output array to store all the results.

这是我的Numpy代码:

Here's my Numpy code :

def calcul2(M):

     return np.sqrt(((M[:,:,np.newaxis] - M[:,np.newaxis,:])**2).sum(axis=0))

此处M是初始输入数组,但在函数调用前以numpy表示transpose(),以坐标(x,y,z)为行,点为列.

Here M is the initial entry array but transpose() by numpy before the function call to have coordinates(x,y,z) as rows and points as columns.

此外,此numpy函数非常方便,因为它返回的数组组织良好.这是一个n x n的数组,其中点的数量为n,每个点都有一行和一列.因此,例如,距离AB存储在A行和B列的交点索引处.

Moreover this numpy function is quite convinient because the array it returns is well organise. It's a n by n array with n the number of points and each points has a row and a column. So for example the distance AB is stored at the intersection index of row A and column B.

这是我如何称呼它们(cython函数):

Here's how I call them (cython function):

cpdef test():

  cdef double[::1] Mf 
  cdef double[::1] out = np.empty(17998000,dtype=np.float64) # (6000² - 6000) / 2

  M = np.arange(6000*3,dtype=np.float64).reshape(6000,3) # Example array with 6000 points
  Mf = M.flatten() #because my cython algorithm need a 1D array
  Mt = M.transpose() # because my numpy algorithm need coordinates as rows

  calcul2(Mt)

  calcul1(Mf,out)

我在这里做错什么了吗?对于我的项目来说,两者都不够快.

Am I doing something wrong here ? For my project both are not fast enough.

1:有没有一种方法可以改善我的cython代码,从而击败numpy的速度?

1: Is there a way to improve my cython code in order to beat numpy's speed ?

2:有没有一种方法可以改善我的numpy代码以使其计算更快?

2: Is there a way to improve my numpy code to compute even faster ?

3:或任何其他解决方案,但必须是python/cython(如并行计算)吗?

3: Or any other solutions, but it must be a python/cython (like parallel computing) ?

谢谢.

推荐答案

不确定您要从何处获取时间,但是可以使用

Not sure where you are getting your timings, but you can use scipy.spatial.distance:

M = np.arange(6000*3, dtype=np.float64).reshape(6000,3)
np_result = calcul2(M)
sp_result = sd.cdist(M.T, M.T) #Scipy usage
np.allclose(np_result, sp_result)
>>> True

时间:

%timeit calcul2(M)
1000 loops, best of 3: 313 µs per loop

%timeit sd.cdist(M.T, M.T)
10000 loops, best of 3: 86.4 µs per loop

重要的是,认识到您的输出是对称的也很有用:

Importantly, its also useful to realize that your output is symmetric:

np.allclose(sp_result, sp_result.T)
>>> True

一种替代方法是仅计算此数组的上三角:

An alternative is to only compute the upper triangular of this array:

%timeit sd.pdist(M.T)
10000 loops, best of 3: 39.1 µs per loop

不确定要压缩哪个索引,看起来您可能同时做这两种方式?压缩其他索引以进行比较:

Not sure which index you want to zip, looks like you may be doing it both ways? Zipping the other index for comparison:

%timeit sd.pdist(M)
10 loops, best of 3: 135 ms per loop

仍然比您当前的NumPy实现快约10倍.

Still about 10x faster than your current NumPy implementation.

这篇关于计算python中每个点之间的距离的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆