向量化欧氏距离计算 - NumPy [英] Vectorizing euclidean distance computation - NumPy

查看:92
本文介绍了向量化欧氏距离计算 - NumPy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是关于我的代码的矢量化.我有一个保存 3D 坐标的数组和一个保存连接坐标的边信息的数组:

在 [8]:coords出[8]:数组([[ 11.22727013, 24.72620964, 2.02986932],[ 11.23895836, 24.67577744, 2.04130101],[ 11.23624039, 24.63677788, 2.04096866],[ 11.22516632, 24.5986824, 2.04045677],[ 11.21166992, 24.56095695, 2.03898215],[ 11.20334721, 24.5227356, 2.03556442],[ 11.2064085, 24.48479462, 2.03098583],[ 11.22059727, 24.44837189, 2.02649784],[ 11.24213409, 24.41513252, 2.01979685]])在 [13] 中:边缘出[13]:数组([[0, 1],[1, 2],[2, 3],[3, 4],[4, 5],[5, 6],[6, 7],[7, 8],], dtype=int32)

现在,我想计算边数组中坐标之间的欧几里得距离的总和.例如.coords[0] 到 coords[1] 的距离 + coords[1] 到 coords[2] 的距离 .....

我有以下代码,可以完成这项工作:

def networkLength(coords, edge):从 scipy.spatial 导入距离距离网络 = np.array([])对于 i 在范围内(edges.shape[0]):distancesNetwork = np.append(distancesNetwork, distance.euclidean(coords[edges[i, 0]], coords[edges[i, 1]]))返回总和(距离网络)

我想知道是否可以对代码进行矢量化,而不是进行循环.蟒蛇的方法是什么?非常感谢!!

解决方案

方法 #1

我们可以将第一列和第二列一起切出以索引到coords,而不是沿它们迭代每个元素并执行欧几里德距离计算,其中涉及沿每行的元素平方和求和以及然后得到按元素的平方根.最后,我们需要对原始代码中显示的一个标量的所有这些值求和.

因此,一种矢量化实现将是 -

np.sqrt(((coords[edges[:, 0]] - coords[edges[:, 1]])**2).sum(1)).sum()

NumPy 内置了一个用于执行这些距离计算操作的距离计算操作 np.linalg.norm.在性能方面,我认为它会与我们之前列出的相媲美.为了完整起见,实现将是 -

np.linalg.norm(coords[edges[:, 0]] - coords[edges[:, 1]],axis=1).sum()

方法#2

调整之前的方法,我们可以使用 np.einsum 在一个步骤中将同时执行平方沿每一行求和,因此效率会更高一些.>

实现看起来像这样 -

s = coords[edges[:, 0]] - coords[edges[:, 1]]out = np.sqrt(np.einsum('ij,ij->i',s,s)).sum()

<小时>

运行时测试

函数定义 -

def networkLength(coords, edge): # 问题的原始代码距离网络 = np.array([])对于 i 在范围内(edges.shape[0]):distancesNetwork = np.append(distancesNetwork, \distance.euclidean(coords[edges[i, 0]], coords[edges[i, 1]]))返回总和(距离网络)def vectorized_app1(坐标,边):return np.sqrt(((coords[edges[:, 0]] - coords[edges[:, 1]])**2).sum(1)).sum()def vectorized_app2(坐标,边):s = coords[edges[:, 0]] - coords[edges[:, 1]]return np.sqrt(np.einsum('ij,ij->i',s,s)).sum()

验证和时间安排 -

In [114]: # 设置更大的输入...:坐标 = np.random.rand(100,3)...:边 = np.random.randint(0,100,(10000,2))# 验证所有方法的结果在 [115] 中:networkLength(坐标,边)出[115]:6607.8829431403547在 [116]: vectorized_app1(coords,edges)出[116]:6607.8829431403337在 [117]: vectorized_app2(coords,edges)出[117]:6607.8829431403337在 [118]: %timeit networkLength(coords, edge)...:%timeit vectorized_app1(坐标,边)...:%timeit vectorized_app2(坐标,边)...:1 个循环,最好的 3 个:每个循环 519 毫秒1000 个循环,最好的 3 个:每个循环 822 µs1000 个循环,最好的 3 个:每个循环 668 µs

my question regards the vectorization of my code. I have one array that holds 3D-coordinates and one array that holds the information of edges that connect the coordinates:

In [8]:coords
Out[8]: 
array([[ 11.22727013,  24.72620964,   2.02986932],
       [ 11.23895836,  24.67577744,   2.04130101],
       [ 11.23624039,  24.63677788,   2.04096866],
       [ 11.22516632,  24.5986824 ,   2.04045677],
       [ 11.21166992,  24.56095695,   2.03898215],
       [ 11.20334721,  24.5227356 ,   2.03556442],
       [ 11.2064085 ,  24.48479462,   2.03098583],
       [ 11.22059727,  24.44837189,   2.02649784],
       [ 11.24213409,  24.41513252,   2.01979685]])

In [13]:edges
Out[13]: 
array([[0, 1],
       [1, 2],
       [2, 3],
       [3, 4],
       [4, 5],
       [5, 6],
       [6, 7],
       [7, 8],], dtype=int32)

Now, I would like to calculate the sum of the euclidian distance between the coordinates in the edges array. E.g. Distance from coords[0] to coords[1] + distance from coords[1] to coords[2] .....

I have the following code, which does the job:

def networkLength(coords, edges):

   from scipy.spatial import distance 
   distancesNetwork = np.array([])    

   for i in range(edges.shape[0]):
        distancesNetwork = np.append(distancesNetwork, distance.euclidean(coords[edges[i, 0]], coords[edges[i, 1]]))

   return sum(distancesNetwork)

I was wondering whether it is possible to vectorize the code, rather than doing a loop. What is the pythonian way to do it? Thanks a lot!!

解决方案

Approach #1

We could slice out the first and second columns altogether for indexing into coords instead of iterating for each element along them and perform the euclidean distance computations that involves element-wise squaring and summing along each row and then getting the element-wise square-root. Finally, we need to sum all those values for one scalar as shown in the original code.

Thus, one vectorized implementation would be -

np.sqrt(((coords[edges[:, 0]] - coords[edges[:, 1]])**2).sum(1)).sum()

There's a built-in in NumPy to do those distance computing operations as np.linalg.norm. In terms of performance, I would think it would be comparable to what we have just listed earlier. For the sake of completeness, the implementation would be -

np.linalg.norm(coords[edges[:, 0]] - coords[edges[:, 1]],axis=1).sum()

Approach #2

Tweaking the earlier approach, we could use np.einsum that in one step would perform both squaring and summing along each row and as such would be a bit more efficient.

The implementation would look something like this -

s = coords[edges[:, 0]] - coords[edges[:, 1]]
out = np.sqrt(np.einsum('ij,ij->i',s,s)).sum()


Runtime test

Function definitions -

def networkLength(coords, edges): # Original code from question
   distancesNetwork = np.array([])    
   for i in range(edges.shape[0]):
        distancesNetwork = np.append(distancesNetwork, \
        distance.euclidean(coords[edges[i, 0]], coords[edges[i, 1]]))
   return sum(distancesNetwork)

def vectorized_app1(coords, edges):
    return np.sqrt(((coords[edges[:, 0]] - coords[edges[:, 1]])**2).sum(1)).sum()

def vectorized_app2(coords, edges):
    s = coords[edges[:, 0]] - coords[edges[:, 1]]
    return np.sqrt(np.einsum('ij,ij->i',s,s)).sum()

Verification and Timings -

In [114]: # Setup bigger inputs
     ...: coords = np.random.rand(100,3)
     ...: edges = np.random.randint(0,100,(10000,2))

# Verify results across all approaches
In [115]: networkLength(coords, edges)
Out[115]: 6607.8829431403547

In [116]: vectorized_app1(coords, edges)
Out[116]: 6607.8829431403337

In [117]: vectorized_app2(coords, edges)
Out[117]: 6607.8829431403337

In [118]: %timeit networkLength(coords, edges)
     ...: %timeit vectorized_app1(coords, edges)
     ...: %timeit vectorized_app2(coords, edges)
     ...: 
1 loops, best of 3: 519 ms per loop
1000 loops, best of 3: 822 µs per loop
1000 loops, best of 3: 668 µs per loop

这篇关于向量化欧氏距离计算 - NumPy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆