如何在Python中从Scipy中的链接/距离矩阵计算群集分配? [英] How to compute cluster assignments from linkage/distance matrices in scipy in Python?

查看:56
本文介绍了如何在Python中从Scipy中的链接/距离矩阵计算群集分配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果您在python中的scipy中具有此层次结构的集群调用:

if you have this hierarchical clustering call in scipy in Python:

from scipy.cluster.hierarchy import linkage
# dist_matrix is long form distance matrix
linkage_matrix = linkage(squareform(dist_matrix), linkage_method)

那么从这到单个点的聚类分配的一种有效方法是什么?即长度为N的向量,其中N是点数,其中每个条目i是点i的簇数,给定的结果是由给定阈值thresh在生成的簇上生成的簇数?

then what's an efficient way to go from this to cluster assignments for individual points? i.e. a vector of length N where N is number of points, where each entry i is the cluster number of point i, given the number of clusters generated by a given threshold thresh on the resulting clustering?

要澄清的是:群集号将是在将阈值应用于树之后的所在群集.在这种情况下,您将为其所在的群集的每个叶节点获得一个唯一的群集.在每个意义上,每个点都属于一个最特定的群集",这是唯一的,该群集由切割树状图的阈值定义.

To clarify: The cluster number would be the cluster that it's in after applying a threshold to the tree. In which case you would get a unique cluster for each leaf node for the cluster that it is in. Unique in the sense that each point belongs to one "most specific cluster" which is defined by the threshold where you cut the dendrogram.

我知道scipy.cluster.hierarchy.fclusterdata为您提供了该群集分配作为其返回值,但是我是从定制的距离矩阵和距离度量开始的,所以我不能使用fclusterdata.问题归结为:我该如何计算正在计算的fclusterdata集群分配?

I know that scipy.cluster.hierarchy.fclusterdata gives you this cluster assignment as its return value, but I am starting from a custom made distance matrix and distance metric, so I cannot use fclusterdata. The question boils down to: how can I compute what fclusterdata is computing -- the cluster assignments?

推荐答案

如果我理解正确,那就是

If I understand you right, that is what fcluster does:

scipy.cluster.hierarchy.fcluster(Z, t, criterion='inconsistent', depth=2, R=None, monocrit=None)

从链接矩阵Z定义的层次聚类中形成平面聚类.

Forms flat clusters from the hierarchical clustering defined by the linkage matrix Z.

...

返回:长度为n的数组. T [i]是原始观测值i所属的平面簇数.

Returns: An array of length n. T[i] is the flat cluster number to which original observation i belongs.

因此只需致电fcluster(linkage_matrix, t),其中t是您的阈值.

So just call fcluster(linkage_matrix, t), where t is your threshold.

这篇关于如何在Python中从Scipy中的链接/距离矩阵计算群集分配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆