Python中,成对“距离”,需要一个快速的方法来做到这一点 [英] Python, Pairwise 'distance', need a fast way to do it

查看:214
本文介绍了Python中,成对“距离”,需要一个快速的方法来做到这一点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有关在我的博士方项目,我从事Python的造型有些系统的任务。效率明智的,我计划打在下面的问题,我将在一个最小工作实例揭露的瓶颈。

For a side project in my PhD, I engaged in the task of modelling some system in Python. Efficiency wise, my program hits a bottleneck in the following problem, which I'll expose in a Minimal Working Example.

我处理与他们的3D开始和端点的大量段连接codeD的,所以每个段再由标量6 psented $ P $。

I deal with a large number of segments encoded by their 3D beginning and endpoints, so each segment is represented by 6 scalars.

我需要计算成对最小的板块间的距离。两段之间的最小距离的分析前pression在这个被找到。到MWE:

I need to calculate a pairwise minimal intersegment distance. The analytical expression of the minimal distance between two segments is found in this source. To the MWE:

import numpy as np
N_segments = 1000
List_of_segments = np.random.rand(N_segments, 6)

Pairwise_minimal_distance_matrix = np.zeros( (N_segments,N_segments) )
for i in range(N_segments):
    for j in range(i+1,N_segments): 

        p0 = List_of_segments[i,0:3] #beginning point of segment i
        p1 = List_of_segments[i,3:6] #end point of segment i
        q0 = List_of_segments[j,0:3] #beginning point of segment j
        q1 = List_of_segments[j,3:6] #end point of segment j
        #for readability, some definitions
        a = np.dot( p1-p0, p1-p0)
        b = np.dot( p1-p0, q1-q0)
        c = np.dot( q1-q0, q1-q0)
        d = np.dot( p1-p0, p0-q0)
        e = np.dot( q1-q0, p0-q0)
        s = (b*e-c*d)/(a*c-b*b)
        t = (a*e-b*d)/(a*c-b*b)
        #the minimal distance between segment i and j
        Pairwise_minimal_distance_matrix[i,j] = sqrt(sum( (p0+(p1-p0)*s-(q0+(q1-q0)*t))**2)) #minimal distance

现在,我意识到这是非常低效的,这就是为什么我在这里。我在如何避免环路广泛看了看,但是我遇到了一点问题。显然,这种计算最好用<一个完成href=\"http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html#scipy.spatial.distance.cdist\">cdist蟒蛇。但是,自定义距离函数,它可以处理需要成为二进制函数。这是在我的情况的一个问题,因为我的载体具有明确的6的长度,并有位分裂成他们的第一个和最后3个组成部分。我不认为我可以在距离计算转化为一个二元函数。

Now, I realize this is extremely inefficient, and this is why I am here. I have looked extensively in how to avoid the loop, but I run into a bit of a problem. Apparently, this sort of calculations is best done with the cdist of python. However, the custom distance functions it can handle have to be binary functions. This is a problem in my case, because my vectors have specifically a length of 6, and have to bit split into their first and last 3 components. I don't think I can translate the distance calculation into a binary function.

任何输入AP preciated。

Any input is appreciated.

推荐答案

您可以使用numpy的的矢量能力,加速计算。我的版本计算一次距离矩阵中的所有元素,然后设置对角线和下部三角形为零。

You can use numpy's vectorization capabilities to speed up the calculation. My version computes all elements of the distance matrix at once and then sets the diagonal and the lower triangle to zero.

def pairwise_distance2(s):
    # we need this because we're gonna divide by zero
    old_settings = np.seterr(all="ignore")

    N = N_segments # just shorter, could also use len(s)

    # we repeat p0 and p1 along all columns
    p0 = np.repeat(s[:,0:3].reshape((N, 1, 3)), N, axis=1)
    p1 = np.repeat(s[:,3:6].reshape((N, 1, 3)), N, axis=1)
    # and q0, q1 along all rows
    q0 = np.repeat(s[:,0:3].reshape((1, N, 3)), N, axis=0)
    q1 = np.repeat(s[:,3:6].reshape((1, N, 3)), N, axis=0)

    # element-wise dot product over the last dimension,
    # while keeping the number of dimensions at 3
    # (so we can use them together with the p* and q*)
    a = np.sum((p1 - p0) * (p1 - p0), axis=-1).reshape((N, N, 1))
    b = np.sum((p1 - p0) * (q1 - q0), axis=-1).reshape((N, N, 1))
    c = np.sum((q1 - q0) * (q1 - q0), axis=-1).reshape((N, N, 1))
    d = np.sum((p1 - p0) * (p0 - q0), axis=-1).reshape((N, N, 1))
    e = np.sum((q1 - q0) * (p0 - q0), axis=-1).reshape((N, N, 1))

    # same as above
    s = (b*e-c*d)/(a*c-b*b)
    t = (a*e-b*d)/(a*c-b*b)

    # almost same as above
    pairwise = np.sqrt(np.sum( (p0 + (p1 - p0) * s - ( q0 + (q1 - q0) * t))**2, axis=-1))

    # turn the error reporting back on
    np.seterr(**old_settings)

    # set everything at or below the diagonal to 0
    pairwise[np.tril_indices(N)] = 0.0

    return pairwise

现在,让我们把它兜风。随着你的榜样, N = 1000 ,我得到的时序

Now let's take it for a spin. With your example, N = 1000, I get a timing of

%timeit pairwise_distance(List_of_segments)
1 loops, best of 3: 10.5 s per loop

%timeit pairwise_distance2(List_of_segments)
1 loops, best of 3: 398 ms per loop

当然,结果是相同的:

And of course, the results are the same:

(pairwise_distance2(List_of_segments) == pairwise_distance(List_of_segments)).all()

收益。我也在pretty肯定有一个矩阵乘法的某处隐藏在算法,所以应该有进一步的加速(和清理),一些潜在的。

returns True. I'm also pretty sure there's a matrix multiplication hidden somewhere in the algorithm, so there should be some potential for further speedup (and also cleanup).

顺便说一句:我先用numba没有成功尝试简单。不知道为什么,虽然。

By the way: I've tried simply using numba first without success. Not sure why, though.

这篇关于Python中,成对“距离”,需要一个快速的方法来做到这一点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆