在scipy.cluster.hierarchy.linkage()中使用距离矩阵? [英] Use Distance Matrix in scipy.cluster.hierarchy.linkage()?
问题描述
我有一个距离矩阵n * n M
,其中M_ij
是object_i
和object_j
之间的距离.因此,正如预期的那样,它采用以下形式:
I have a distance matrix n*n M
where M_ij
is the distance between object_i
and object_j
. So as expected, it takes the following form:
/ 0 M_01 M_02 ... M_0n\
| M_10 0 M_12 ... M_1n |
| M_20 M_21 0 ... M2_n |
| ... |
\ M_n0 M_n2 M_n2 ... 0 /
现在,我希望通过分层聚类将这n个对象聚类. Python有一个称为scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean')
的实现.
Now I wish to cluster these n objects with hierarchical clustering. Python has an implementation of this called scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean')
.
其文档说:
y必须是{n \ choose 2}大小的向量,其中n是 原始观测值与距离矩阵配对.
y must be a {n \choose 2} sized vector where n is the number of original observations paired in the distance matrix.
y:ndarray
y : ndarray
一个精简或冗余的距离矩阵.浓缩 距离矩阵是一个平面数组,其中包含 距离矩阵.这是pdist返回的形式.或者, 可以传递n个维度中m个观察向量的集合,例如 一个m×n的数组.
A condensed or redundant distance matrix. A condensed distance matrix is a flat array containing the upper triangular of the distance matrix. This is the form that pdist returns. Alternatively, a collection of m observation vectors in n dimensions may be passed as an m by n array.
我对y
的这种描述感到困惑. 我可以直接输入y
作为我的M
吗?
I am confused by this description of y
. Can I directly feed my M
in as the input y
?
更新
@ hongbo-zhu-cn 已在GitHub上提出了这个问题.这正是我所关心的.但是,作为GitHub的新手,我不知道它是如何工作的,因此不知道如何解决此问题.
@hongbo-zhu-cn has raised this issue up in GitHub. This is exactly what I am concerning about. However, as a newbie to GitHub, I don't know how it works and therefore have no idea how this issue is dealt with.
推荐答案
似乎确实不能直接传递冗余方阵,尽管文档声称可以这样做.
It seems that indeed we cannot directly pass the redundant square matrix in, although the documentation claims we can do so.
为了使将来遇到相同问题的任何人受益,我在此处写我的解决方案作为附加答案.因此,复制粘贴人员可以继续进行聚类.
To benefit anyone who faces the same problem in the future, I write my solution as an additional answer here. So the copy-and-paste guys can just proceed with the clustering.
使用以下代码段压缩矩阵并愉快地进行.
Use the following snippet to condense the matrix and happily proceed.
import scipy.spatial.distance as ssd
# convert the redundant n*n square matrix form into a condensed nC2 array
distArray = ssd.squareform(distMatrix) # distArray[{n choose 2}-{n-i choose 2} + (j-i-1)] is the distance between points i and j
如果我错了,请纠正我.
Please correct me if I am wrong.
这篇关于在scipy.cluster.hierarchy.linkage()中使用距离矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!