有效地计算Numpy中的欧几里得距离矩阵? [英] Efficiently Calculating a Euclidean Dist Matrix in Numpy?

查看:117
本文介绍了有效地计算Numpy中的欧几里得距离矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个二维数据的大数组(〜20k条目),并且我想计算所有条目之间的成对欧几里德距离.我需要输出具有标准的正方形形式.已经提出了针对此问题的多种解决方案,但是对于大型阵列,似乎都没有一种有效的解决方案.

I have a large array (~20k entries) of two dimension data, and I want to calculate the pairwise Euclidean distance between all entries. I need the output to have standard square form. Multiple solutions for this problem have been proposed, but none of them seem to work efficiently for large arrays.

使用复杂转置的方法对于大型数组失败.

The method using complex transposing fails for large arrays.

Scipy pdist 似乎成为使用numpy的最有效方法.但是,使用正方形关于获得方矩阵的结果使其效率非常低下.

Scipy pdist seems to be the most efficient method using numpy. However, using squareform on the result to obtain a square matrix makes it very inefficient.

所以我能想到的最好的方法是使用

So the best I could come up with is using Scipy cdist, which is somewhat awkward, as it does calculate every pairwise distance twice. The provided time measurements show the advantage of pdist for the raw distance calculation.

复杂:49.605 s

Complex: 49.605 s

Cdist:4.820秒

Cdist: 4.820 s

Pdist 1.785 s

Pdist 1.785 s

具有10.212 s矩形的Pdist

Pdist with squareform 10.212 s

推荐答案

我尝试了numpy广播和scipy.spatial.distance.cdist,并且在时间效率方面似乎都相似:

I tried both numpy broadcasting and scipy.spatial.distance.cdist and both seem to be similar when it comes to time efficiency:

import numpy as np
from scipy.spatial.distance import cdist
import time

def dist_numpy(a, b):
    d = np.linalg.norm(a[:, None, :] - b[None, :, :], axis=2)
    d = np.transpose(d)
    sorted_d = np.sort(d)
    sorted_ind = np.argsort(d)
    return sorted_d, sorted_ind

def dist_scipy(a, b):
    d = cdist(a, b, 'euclidean')
    d = np.transpose(d)
    sorted_d = np.sort(d)
    sorted_ind = np.argsort(d)
    return sorted_d, sorted_ind

def get_a_b(r=10**4,c=10** 1):
    a = np.random.uniform(-1, 1, (r, c)).astype('f')
    b = np.random.uniform(-1, 1, (r, c)).astype('f')
    return a,b

if __name__ == "__main__":
    a, b = get_a_b()
    st_t = time.time()
    #dist_numpy(a,b) # comment/ uncomment to execute the code! 
    dist_scipy(a,b) # comment/ uncomment to execute the code!
    print('it took {} s'.format(time.time()-st_t))

这篇关于有效地计算Numpy中的欧几里得距离矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆