凝聚距离矩阵如何工作? (pdist) [英] How does condensed distance matrix work? (pdist)
问题描述
scipy.spatial.distance.pdist
返回一个压缩距离矩阵.来自文档:
scipy.spatial.distance.pdist
returns a condensed distance matrix. From the documentation:
返回一个简化的距离矩阵Y.对于每个和(其中),都会计算度量dist(u = X [i],v = X [j])并将其存储在条目ij中.
Returns a condensed distance matrix Y. For each and (where ), the metric dist(u=X[i], v=X[j]) is computed and stored in entry ij.
我认为ij
表示i*j
.但是我认为我可能是错的.考虑
I thought ij
meant i*j
. But I think I might be wrong. Consider
X = array([[1,2], [1,2], [3,4]])
dist_matrix = pdist(X)
然后文档说dist(X[0], X[2])
应该是dist_matrix[0*2]
.但是,dist_matrix[0*2]
是0 -而不是2.8.
then the documentation says that dist(X[0], X[2])
should be dist_matrix[0*2]
. However, dist_matrix[0*2]
is 0 -- not 2.8 as it should be.
给定i
和j
时,我应该使用什么公式来访问两个向量的相似性?
What's the formula I should use to access the similarity of a two vectors, given i
and j
?
推荐答案
您可以这样看:假设x
是n乘m. m
行的可能对,一次选择两个,为itertools.combinations(range(m), 2)
,例如对于m=3
:
You can look at it this way: Suppose x
is m by n. The possible pairs of m
rows, chosen two at a time, is itertools.combinations(range(m), 2)
, e.g, for m=3
:
>>> import itertools
>>> list(combinations(range(3),2))
[(0, 1), (0, 2), (1, 2)]
因此,如果d = pdist(x)
,则combinations(range(m), 2))
中的第k
个元组给出与d[k]
相关联的x
行的索引.
So if d = pdist(x)
, the k
th tuple in combinations(range(m), 2))
gives the indices of the rows of x
associated with d[k]
.
示例:
>>> x = array([[0,10],[10,10],[20,20]])
>>> pdist(x)
array([ 10. , 22.36067977, 14.14213562])
第一个元素是dist(x[0], x[1])
,第二个元素是dist(x[0], x[2])
,第三个元素是dist(x[1], x[2])
.
The first element is dist(x[0], x[1])
, the second is dist(x[0], x[2])
and the third is dist(x[1], x[2])
.
或者您可以将其视为平方距离矩阵的上三角部分中的元素,并串成一维数组.
Or you can view it as the elements in the upper triangular part of the square distance matrix, strung together into a 1D array.
例如
>>> squareform(pdist(x))
array([[ 0. , 10. , 22.361],
[ 10. , 0. , 14.142],
[ 22.361, 14.142, 0. ]])
>>> y = array([[0,10],[10,10],[20,20],[10,0]])
>>> squareform(pdist(y))
array([[ 0. , 10. , 22.361, 14.142],
[ 10. , 0. , 14.142, 10. ],
[ 22.361, 14.142, 0. , 22.361],
[ 14.142, 10. , 22.361, 0. ]])
>>> pdist(y)
array([ 10. , 22.361, 14.142, 14.142, 10. , 22.361])
这篇关于凝聚距离矩阵如何工作? (pdist)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!