阵列中的点之间的快速加权欧氏距离 [英] Fast weighted euclidean distance between points in arrays

查看:187
本文介绍了阵列中的点之间的快速加权欧氏距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要有效地计算欧几里德的加权的距离,每 X,Y 指在给定的阵列到其他每个 X,Y 指向另一阵列。这是code我预期它的工作原理:

I need to efficiently calculate the euclidean weighted distances for every x,y point in a given array to every other x,y point in another array. This is the code I have which works as expected:

import numpy as np
import random

def rand_data(integ):
    '''
    Function that generates 'integ' random values between [0.,1.)
    '''
    rand_dat = [random.random() for _ in range(integ)]

    return rand_dat

def weighted_dist(indx, x_coo, y_coo):
    '''
    Function that calculates *weighted* euclidean distances.
    '''
    dist_point_list = []
    # Iterate through every point in array_2.
    for indx2, x_coo2 in enumerate(array_2[0]):
        y_coo2 = array_2[1][indx2]
        # Weighted distance in x.
        x_dist_weight = (x_coo-x_coo2)/w_data[0][indx] 
        # Weighted distance in y.
        y_dist_weight = (y_coo-y_coo2)/w_data[1][indx] 
        # Weighted distance between point from array_1 passed and this point
        # from array_2.
        dist = np.sqrt(x_dist_weight**2 + y_dist_weight**2)
        # Append weighted distance value to list.
        dist_point_list.append(round(dist, 8))

    return dist_point_list


# Generate random x,y data points.
array_1 = np.array([rand_data(10), rand_data(10)], dtype=float)

# Generate weights for each x,y coord for points in array_1.
w_data = np.array([rand_data(10), rand_data(10)], dtype=float)

# Generate second larger array.
array_2 = np.array([rand_data(100), rand_data(100)], dtype=float)


# Obtain *weighted* distances for every point in array_1 to every point in array_2.
dist = []
# Iterate through every point in array_1.
for indx, x_coo in enumerate(array_1[0]):
    y_coo = array_1[1][indx]
    # Call function to get weighted distances for this point to every point in
    # array_2.
    dist.append(weighted_dist(indx, x_coo, y_coo))

最后名单 DIST 为点是在每个为点尽可能多的元素在第一阵列中拥有尽可能多的子列表在第二个(加权距离)。

The final list dist holds as many sub-lists as points are in the first array with as many elements in each as points are in the second one (the weighted distances).

我想知道如果有一种方法,使这个code更有效,也许使用 cdist 功能,因为这个过程变得相当昂贵,当阵列有大量的元素(这在我的情况下,他们有),当我把检查大量阵列的距离(这也是我有)

I'd like to know if there's a way to make this code more efficient, perhaps using the cdist function, because this process becomes quite expensive when the arrays have lots of elements (which in my case they have) and when I have to check the distances for lots of arrays (which I also have)

推荐答案

@Evan和@Martinis集团在正确的轨道上 - 在埃文的回答扩张,这是一个使用广播的快速计算n维加权欧式功能Python的无环路的距离:

@Evan and @Martinis Group are on the right track - to expand on Evan's answer, here's a function that uses broadcasting to quickly calculate the n-dimensional weighted euclidean distance without Python loops:

import numpy as np

def fast_wdist(A, B, W):
    """
    Compute the weighted euclidean distance between two arrays of points:

    D{i,j} = 
    sqrt( ((A{0,i}-B{0,j})/W{0,i})^2 + ... + ((A{k,i}-B{k,j})/W{k,i})^2 )

    inputs:
        A is an (k, m) array of coordinates
        B is an (k, n) array of coordinates
        W is an (k, m) array of weights

    returns:
        D is an (m, n) array of weighted euclidean distances
    """

    # compute the differences and apply the weights in one go using
    # broadcasting jujitsu. the result is (n, k, m)
    wdiff = (A[np.newaxis,...] - B[np.newaxis,...].T) / W[np.newaxis,...]

    # square and sum over the second axis, take the sqrt and transpose. the
    # result is an (m, n) array of weighted euclidean distances
    D = np.sqrt((wdiff*wdiff).sum(1)).T

    return D

要检查该工程确定,我们将它比作一个使用Python的嵌套以较慢的速度循环:

To check that this works OK, we'll compare it to a slower version that uses nested Python loops:

def slow_wdist(A, B, W):

    k,m = A.shape
    _,n = B.shape
    D = np.zeros((m, n))

    for ii in xrange(m):
        for jj in xrange(n):
            wdiff = (A[:,ii] - B[:,jj]) / W[:,ii]
            D[ii,jj] = np.sqrt((wdiff**2).sum())
    return D

首先,让我们确保这两个函数给出了相同的答案:

First, let's make sure that the two functions give the same answer:

# make some random points and weights
def setup(k=2, m=100, n=300):
    return np.random.randn(k,m), np.random.randn(k,n),np.random.randn(k,m)

a, b, w = setup()
d0 = slow_wdist(a, b, w)
d1 = fast_wdist(a, b, w)

print np.allclose(d0, d1)
# True

不用说,使用广播,而不是循环的Python版本是几个数量级速度:

Needless to say, the version that uses broadcasting rather than Python loops is several orders of magnitude faster:

%%timeit a, b, w = setup()
slow_wdist(a, b, w)
# 1 loops, best of 3: 647 ms per loop

%%timeit a, b, w = setup()
fast_wdist(a, b, w)
# 1000 loops, best of 3: 620 us per loop

这篇关于阵列中的点之间的快速加权欧氏距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆