scipy.cluster.hierarchy.cut_tree()的替代方法 [英] Alternative to scipy.cluster.hierarchy.cut_tree()

查看:435
本文介绍了scipy.cluster.hierarchy.cut_tree()的替代方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Python 3中进行了一个聚集的层次聚类实验,发现scipy.cluster.hierarchy.cut_tree()没有返回某些输入链接矩阵所要求的聚类数量.因此,到目前为止,我知道此处所述). >

但是,我需要能够为数据点分配k个不同的标签,以实现扁平聚类.您是否知道从任意输入链接矩阵Z中使用k标签获得平坦聚类的算法?我的问题归结为:我该如何计算cut_tree()正在从头开始计算而没有任何错误?

您可以使用此数据集来测试代码.

from scipy.cluster.hierarchy import linkage, is_valid_linkage
from scipy.spatial.distance import pdist

## Load dataset
X = np.load("dataset.npy")

## Hierarchical clustering
dists = pdist(X)
Z = linkage(dists, method='centroid', metric='euclidean')

print(is_valid_linkage(Z))

## Now let's say we want the flat cluster assignement with 10 clusters.
#  If cut_tree() was working we would do
from scipy.cluster.hierarchy import cut_tree
cut = cut_tree(Z, 10)

边注:另一种方法可能是使用 rpy2 cutree()代替scipy的cut_tree(),但我从未使用过.你觉得呢?

解决方案

获得k扁平簇的一种方法是将scipy.cluster.hierarchy.fclustercriterion='maxclust'一起使用:

from scipy.cluster.hierarchy import fcluster
clust = fcluster(Z, k, criterion='maxclust')

I was doing an agglomerative hierarchical clustering experiment in Python 3 and I found scipy.cluster.hierarchy.cut_tree() is not returning the requested number of clusters for some input linkage matrices. So, by now I know there is a bug in the cut_tree() function (as described here).

However, I need to be able to get a flat clustering with an assignment of k different labels to my datapoints. Do you know the algorithm to get a flat clustering with k labels from an arbitrary input linkage matrix Z? My question boils down to: how can I compute what cut_tree() is computing from scratch with no bugs?

You can test your code with this dataset.

from scipy.cluster.hierarchy import linkage, is_valid_linkage
from scipy.spatial.distance import pdist

## Load dataset
X = np.load("dataset.npy")

## Hierarchical clustering
dists = pdist(X)
Z = linkage(dists, method='centroid', metric='euclidean')

print(is_valid_linkage(Z))

## Now let's say we want the flat cluster assignement with 10 clusters.
#  If cut_tree() was working we would do
from scipy.cluster.hierarchy import cut_tree
cut = cut_tree(Z, 10)

Sidenote: An alternative approach could maybe be using rpy2's cutree() as a substitute for scipy's cut_tree(), but I never used it. What do you think?

解决方案

One way to obtain k flat clusters is to use scipy.cluster.hierarchy.fcluster with criterion='maxclust':

from scipy.cluster.hierarchy import fcluster
clust = fcluster(Z, k, criterion='maxclust')

这篇关于scipy.cluster.hierarchy.cut_tree()的替代方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆