如何从 Python 中的 scipy 中的链接/距离矩阵计算集群分配? [英] How to compute cluster assignments from linkage/distance matrices in scipy in Python?

查看:36
本文介绍了如何从 Python 中的 scipy 中的链接/距离矩阵计算集群分配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果你在 Python 中的 scipy 中有这个层次聚类调用:

if you have this hierarchical clustering call in scipy in Python:

from scipy.cluster.hierarchy import linkage
# dist_matrix is long form distance matrix
linkage_matrix = linkage(squareform(dist_matrix), linkage_method)

那么从这里到单个点的集群分配的有效方法是什么?即长度为N的向量,其中N是点的数量,其中每个条目i是点i,给定由给定阈值 thresh 在结果聚类上生成的聚类数?

then what's an efficient way to go from this to cluster assignments for individual points? i.e. a vector of length N where N is number of points, where each entry i is the cluster number of point i, given the number of clusters generated by a given threshold thresh on the resulting clustering?

澄清:集群编号将是在对树应用阈值后它所在的集群.在这种情况下,您将为它所在的集群的每个叶节点获得一个唯一的集群.从某种意义上说,每个点都属于一个最具体的集群",该集群由您切割树状图的阈值定义.

To clarify: The cluster number would be the cluster that it's in after applying a threshold to the tree. In which case you would get a unique cluster for each leaf node for the cluster that it is in. Unique in the sense that each point belongs to one "most specific cluster" which is defined by the threshold where you cut the dendrogram.

我知道 scipy.cluster.hierarchy.fclusterdata 给你这个集群分配作为它的返回值,但我从定制的距离矩阵和距离度量开始,所以我不能使用 fclusterdata.问题归结为:我如何计算 fclusterdata 正在计算的内容——集群分配?

I know that scipy.cluster.hierarchy.fclusterdata gives you this cluster assignment as its return value, but I am starting from a custom made distance matrix and distance metric, so I cannot use fclusterdata. The question boils down to: how can I compute what fclusterdata is computing -- the cluster assignments?

推荐答案

如果我理解正确,那就是 fcluster 确实:

If I understand you right, that is what fcluster does:

scipy.cluster.hierarchy.fcluster(Z, t,criteria='inconsistent', depth=2, R=None, monocrit=None)

根据链接矩阵 Z 定义的层次聚类形成平面聚类.

Forms flat clusters from the hierarchical clustering defined by the linkage matrix Z.

...

返回:长度为 n 的数组.T[i] 是原始观测 i 所属的平坦簇数.

Returns: An array of length n. T[i] is the flat cluster number to which original observation i belongs.

所以只需调用 fcluster(linkage_matrix, t),其中 t 是您的阈值.

So just call fcluster(linkage_matrix, t), where t is your threshold.

这篇关于如何从 Python 中的 scipy 中的链接/距离矩阵计算集群分配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆