在scipy.cluster.hierarchy.linkage()中使用距离矩阵? [英] Use Distance Matrix in scipy.cluster.hierarchy.linkage()?

查看:522
本文介绍了在scipy.cluster.hierarchy.linkage()中使用距离矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个距离矩阵n * n M,其中M_ijobject_iobject_j之间的距离.因此,正如预期的那样,它采用以下形式:

I have a distance matrix n*n M where M_ij is the distance between object_i and object_j. So as expected, it takes the following form:

   /  0     M_01    M_02    ...    M_0n\
   | M_10    0      M_12    ...    M_1n |
   | M_20   M_21     0      ...    M2_n |
   |                ...                 |
   \ M_n0   M_n2    M_n2    ...      0 / 

现在,我希望通过分层聚类将这n个对象聚类. Python有一个称为scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean')的实现.

Now I wish to cluster these n objects with hierarchical clustering. Python has an implementation of this called scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean').

其文档说:

y必须是{n \ choose 2}大小的向量,其中n是 原始观测值与距离矩阵配对.

y must be a {n \choose 2} sized vector where n is the number of original observations paired in the distance matrix.

y:ndarray

y : ndarray

一个精简或冗余的距离矩阵.浓缩 距离矩阵是一个平面数组,其中包含 距离矩阵.这是pdist返回的形式.或者, 可以传递n个维度中m个观察向量的集合,例如 一个m×n的数组.

A condensed or redundant distance matrix. A condensed distance matrix is a flat array containing the upper triangular of the distance matrix. This is the form that pdist returns. Alternatively, a collection of m observation vectors in n dimensions may be passed as an m by n array.

我对y的这种描述感到困惑. 我可以直接输入y作为我的M吗?

I am confused by this description of y. Can I directly feed my M in as the input y?

更新

@ hongbo-zhu-cn 已在GitHub上提出了这个问题.这正是我所关心的.但是,作为GitHub的新手,我不知道它是如何工作的,因此不知道如何解决此问题.

@hongbo-zhu-cn has raised this issue up in GitHub. This is exactly what I am concerning about. However, as a newbie to GitHub, I don't know how it works and therefore have no idea how this issue is dealt with.

推荐答案

似乎确实不能直接传递冗余方阵,尽管文档声称可以这样做.

It seems that indeed we cannot directly pass the redundant square matrix in, although the documentation claims we can do so.

为了使将来遇到相同问题的任何人受益,我在此处写我的解决方案作为附加答案.因此,复制粘贴人员可以继续进行聚类.

To benefit anyone who faces the same problem in the future, I write my solution as an additional answer here. So the copy-and-paste guys can just proceed with the clustering.

使用以下代码段压缩矩阵并愉快地进行.

Use the following snippet to condense the matrix and happily proceed.

import scipy.spatial.distance as ssd
# convert the redundant n*n square matrix form into a condensed nC2 array
    distArray = ssd.squareform(distMatrix) # distArray[{n choose 2}-{n-i choose 2} + (j-i-1)] is the distance between points i and j

如果我错了,请纠正我.

Please correct me if I am wrong.

这篇关于在scipy.cluster.hierarchy.linkage()中使用距离矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆