修剪scipy中的树状图(分层聚类) [英] Pruning dendrogram in scipy (hierarchical clustering)

查看:277
本文介绍了修剪scipy中的树状图(分层聚类)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个约有5000个条目的距离矩阵,并使用scipy的分层聚类方法对矩阵进行聚类.我用于此的代码是以下代码段:

I have a distance matrix with about 5000 entries, and use scipy's hierarchical clustering methods to cluster the matrix. The code I use for this is the following snippet:

Y = fastcluster.linkage(D, method='centroid') # D-distance matrix
Z1 = sch.dendrogram(Y,truncate_mode='level', p=7,show_contracted=True)

由于所有这些数据的树状图将变得非常密集,因此我使用truncate_mode对其进行了一些修剪.所有这些都有效,但是我不知道如何才能找到原始的5000个条目中的哪一个属于树状图中的特定分支.

Since the dendrogram will become rather dense with all this data, I use the truncate_mode to prune it a bit. All of this works, but I wonder how I can find out which of the original 5000 entries belong to a particular branch in the dendrogram.

我尝试使用

 leaves = sch.leaves_list(Y)

以获取叶子列表,但这将链接输出用作indata,虽然我可以看到修剪后的树状图和叶子列表之间的对应关系,但手动将原始条目映射到树状图上有点麻烦.

to get a list of leaves, but this uses the linkage output as indata, and while I can see the correspondence between the pruned dendrogram and the leaves-list, it becomes a bit cumbersome to map original entries manually to the dendrogram.

总结:有没有一种方法可以列出距离矩阵中属于修剪后的树状图中一个分支的所有原始条目?或者还有其他我不知道的方法.

To summarize: Is there a way of listing all the original entries in the distance matrix that belongs to a branch in a pruned dendrogram? Or are there other methods of doing this that I am not aware of.

谢谢

推荐答案

scipy.cluster.hierarchy.dendrogram返回的词典数据结构之一具有键ivl,即

One of the dictionary data-structures returned by scipy.cluster.hierarchy.dendrogram has the key ivl, that the documentation describes as:

与叶节点相对应的标签列表

a list of labels corresponding to the leaf nodes

您可以提供自定义标签(使用labels=<array of lables>)作为树状图功能的输入,但默认情况下,它们只是原始观测值的索引.通过比较原始标签/索引和Z1['ivl'],您可以确定原始条目是什么.

You can supply custom labels (using labels=<array of lables>) as input to the dendrogram function but by default, they are just indices of the original observation. By comparing the original labels/indices and Z1['ivl'], you can determine what the original entries were.

这篇关于修剪scipy中的树状图(分层聚类)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆