在许多向量之间构建成对距离矩阵的有效方法? [英] efficient way of constructing a matrix of pair-wise distances between many vectors?

查看:66
本文介绍了在许多向量之间构建成对距离矩阵的有效方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,感谢您阅读并花时间回复.

First, thanks for reading and taking the time to respond.

二、问题:

我有一个 PxN 矩阵 X,其中 P 的数量级为 10^6,N 的数量级为 10^3.所以,X 比较大,并不稀疏.假设 X 的每一行都是一个 N 维样本.我想构建这些 P 个样本之间成对距离的 PxP 矩阵.假设我对海灵格距离感兴趣.

I have a PxN matrix X where P is in the order of 10^6 and N is in the order of 10^3. So, X is relatively large and is not sparse. Let's say each row of X is an N-dimensional sample. I want to construct a PxP matrix of pairwise distances between these P samples. Let's also say I am interested in Hellinger distances.

到目前为止,我依赖于稀疏 dok 矩阵:

So far I am relying on sparse dok matrices:

def hellinger_distance(X):
    P = X.shape[0]
    H1 = sp.sparse.dok_matrix((P, P))
    for i in xrange(P):
        if i%100 == 0:
            print i
        x1 = X[i]
        X2 = X[i:P]
        h = np.sqrt(((np.sqrt(x1) - np.sqrt(X2))**2).sum(1)) / math.sqrt(2)       
        H1[i, i:P] = h
    H = H1 + H1.T
    return H

这太慢了.有没有更有效的方法来做到这一点?非常感谢任何帮助.

This is super slow. Is there a more efficient way of doing this? Any help is much appreciated.

推荐答案

您可以使用 pdistsquareform 来自 scipy.spatial.distance -

You can use pdist and squareform from scipy.spatial.distance -

from scipy.spatial.distance import pdist, squareform

out = squareform(pdist(np.sqrt(X)))/np.sqrt(2)

或者使用 cdist 来自同一个 -

Or use cdist from the same -

from scipy.spatial.distance import cdist

sX = np.sqrt(X)
out = cdist(sX,sX)/np.sqrt(2)

这篇关于在许多向量之间构建成对距离矩阵的有效方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆