scipy.cluster.hierarchy.cut_tree()的替代方法 [英] Alternative to scipy.cluster.hierarchy.cut_tree()
问题描述
我在Python 3中进行了一个聚集的层次聚类实验,发现scipy.cluster.hierarchy.cut_tree()
没有返回某些输入链接矩阵所要求的聚类数量.因此,到目前为止,我知道此处所述). >
但是,我需要能够为数据点分配k
个不同的标签,以实现扁平聚类.您是否知道从任意输入链接矩阵Z
中使用k
标签获得平坦聚类的算法?我的问题归结为:我该如何计算cut_tree()
正在从头开始计算而没有任何错误?
您可以使用此数据集来测试代码.
from scipy.cluster.hierarchy import linkage, is_valid_linkage
from scipy.spatial.distance import pdist
## Load dataset
X = np.load("dataset.npy")
## Hierarchical clustering
dists = pdist(X)
Z = linkage(dists, method='centroid', metric='euclidean')
print(is_valid_linkage(Z))
## Now let's say we want the flat cluster assignement with 10 clusters.
# If cut_tree() was working we would do
from scipy.cluster.hierarchy import cut_tree
cut = cut_tree(Z, 10)
边注:另一种方法可能是使用 rpy2 的cutree()
代替scipy的cut_tree()
,但我从未使用过.你觉得呢?
获得k
扁平簇的一种方法是将scipy.cluster.hierarchy.fcluster
与criterion='maxclust'
一起使用:
from scipy.cluster.hierarchy import fcluster
clust = fcluster(Z, k, criterion='maxclust')
I was doing an agglomerative hierarchical clustering experiment in Python 3 and I found scipy.cluster.hierarchy.cut_tree()
is not returning the requested number of clusters for some input linkage matrices. So, by now I know there is a bug in the cut_tree() function (as described here).
However, I need to be able to get a flat clustering with an assignment of k
different labels to my datapoints. Do you know the algorithm to get a flat clustering with k
labels from an arbitrary input linkage matrix Z
? My question boils down to: how can I compute what cut_tree()
is computing from scratch with no bugs?
You can test your code with this dataset.
from scipy.cluster.hierarchy import linkage, is_valid_linkage
from scipy.spatial.distance import pdist
## Load dataset
X = np.load("dataset.npy")
## Hierarchical clustering
dists = pdist(X)
Z = linkage(dists, method='centroid', metric='euclidean')
print(is_valid_linkage(Z))
## Now let's say we want the flat cluster assignement with 10 clusters.
# If cut_tree() was working we would do
from scipy.cluster.hierarchy import cut_tree
cut = cut_tree(Z, 10)
Sidenote: An alternative approach could maybe be using rpy2's cutree()
as a substitute for scipy's cut_tree()
, but I never used it. What do you think?
One way to obtain k
flat clusters is to use scipy.cluster.hierarchy.fcluster
with criterion='maxclust'
:
from scipy.cluster.hierarchy import fcluster
clust = fcluster(Z, k, criterion='maxclust')
这篇关于scipy.cluster.hierarchy.cut_tree()的替代方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!