使用 Numpy 高效计算欧几里得距离矩阵 [英] Efficiently Calculating a Euclidean Distance Matrix Using Numpy

查看:40
本文介绍了使用 Numpy 高效计算欧几里得距离矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在二维空间中有一组点,需要计算每个点到另一个点的距离.

我的点数相对较少,最多可能有 100 个.但是因为我需要经常快速地确定这些移动点之间的关系,并且因为我知道遍历这些点可能与 O(n^2) 复杂性一样糟糕,我正在寻找利用 numpy 的矩阵魔法(或 scipy)的方法.

在我的代码中,每个对象的坐标都存储在它的类中.但是,当我更新类坐标时,我也可以在一个 numpy 数组中更新它们.

class Cell(object):"""代表该字段中的一个对象."""def __init__(self,id,x=0,y=0):self.m_id = idself.m_x = xself.m_y = y

我想到创建一个欧几里得距离矩阵来防止重复,但也许你有一个更聪明的数据结构.

我也乐于接受指向漂亮算法的指针.

另外,我注意到有类似的问题涉及欧几里得距离和 numpy,但没有找到任何直接解决有效填充全距离矩阵的问题.

解决方案

你可以利用 complex 类型:

# 构建一个复杂的单元格数组z = np.array([complex(c.m_x, c.m_y) for c in cell])

第一个解决方案

# 对这个数组进行网格划分,以便您拥有所有组合m, n = np.meshgrid(z, z)# 通过范数获取距离输出 = abs(m-n)

第二种解决方案

网格划分是主要思想.但是 numpy 很聪明,所以你不必生成 m &n.只需使用 z 的转置版本计算差异.网格是自动完成的:

out = abs(z[..., np.newaxis] - z)

第三种解决方案

如果z直接设置为二维数组,可以使用zT代替奇怪的z[..., np.newaxis].所以最后,您的代码将如下所示:

z = np.array([[complex(c.m_x, c.m_y) for c in cell]]) # 注意 [[ ... ]]输出 = abs(z.T-z)

示例

<预><代码>>>>z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])>>>绝对(z.T-z)数组([[ 0., 2.23606798, 4.12310563],[ 2.23606798, 0. , 4.24264069],[ 4.12310563, 4.24264069, 0. ]])

作为补充,您可能想在之后删除重复项,取上三角形:

<预><代码>>>>np.triu(out)数组([[ 0., 2.23606798, 4.12310563],[0., 0., 4.24264069],[ 0. , 0. , 0. ]])

一些基准

<预><代码>>>>timeit.timeit('abs(zT-z)', setup='import numpy as np;z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])')4.645645342274779>>>timeit.timeit('abs(z[..., np.newaxis] - z)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')5.049334864854522>>>timeit.timeit('m, n = np.meshgrid(z, z); abs(mn)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')22.489568296184686

I have a set of points in 2-dimensional space and need to calculate the distance from each point to each other point.

I have a relatively small number of points, maybe at most 100. But since I need to do it often and rapidly in order to determine the relationships between these moving points, and since I'm aware that iterating through the points could be as bad as O(n^2) complexity, I'm looking for ways to take advantage of numpy's matrix magic (or scipy).

As it stands in my code, the coordinates of each object are stored in its class. However, I could also update them in a numpy array when I update the class coordinate.

class Cell(object):
    """Represents one object in the field."""
    def __init__(self,id,x=0,y=0):
        self.m_id = id
        self.m_x = x
        self.m_y = y

It occurs to me to create a Euclidean distance matrix to prevent duplication, but perhaps you have a cleverer data structure.

I'm open to pointers to nifty algorithms as well.

Also, I note that there are similar questions dealing with Euclidean distance and numpy but didn't find any that directly address this question of efficiently populating a full distance matrix.

解决方案

You can take advantage of the complex type :

# build a complex array of your cells
z = np.array([complex(c.m_x, c.m_y) for c in cells])

First solution

# mesh this array so that you will have all combinations
m, n = np.meshgrid(z, z)
# get the distance via the norm
out = abs(m-n)

Second solution

Meshing is the main idea. But numpy is clever, so you don't have to generate m & n. Just compute the difference using a transposed version of z. The mesh is done automatically :

out = abs(z[..., np.newaxis] - z)

Third solution

And if z is directly set as a 2-dimensional array, you can use z.T instead of the weird z[..., np.newaxis]. So finally, your code will look like this :

z = np.array([[complex(c.m_x, c.m_y) for c in cells]]) # notice the [[ ... ]]
out = abs(z.T-z)

Example

>>> z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])
>>> abs(z.T-z)
array([[ 0.        ,  2.23606798,  4.12310563],
       [ 2.23606798,  0.        ,  4.24264069],
       [ 4.12310563,  4.24264069,  0.        ]])

As a complement, you may want to remove duplicates afterwards, taking the upper triangle :

>>> np.triu(out)
array([[ 0.        ,  2.23606798,  4.12310563],
       [ 0.        ,  0.        ,  4.24264069],
       [ 0.        ,  0.        ,  0.        ]])

Some benchmarks

>>> timeit.timeit('abs(z.T-z)', setup='import numpy as np;z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])')
4.645645342274779
>>> timeit.timeit('abs(z[..., np.newaxis] - z)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')
5.049334864854522
>>> timeit.timeit('m, n = np.meshgrid(z, z); abs(m-n)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')
22.489568296184686

这篇关于使用 Numpy 高效计算欧几里得距离矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆