我如何获得scipy.cluster.hierarchy制作的树状图的子树 [英] how do I get the subtrees of dendrogram made by scipy.cluster.hierarchy

查看:123
本文介绍了我如何获得scipy.cluster.hierarchy制作的树状图的子树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对这个模块(scipy.cluster.hierarchy)感到困惑...仍然有一些问题!

I had a confusion regarding this module (scipy.cluster.hierarchy) ... and still have some !

例如,我们具有以下树状图:

For example we have the following dendrogram:

我的问题是我如何才能以一种不错的格式(例如SIF格式)提取彩色的子树(每个子树代表一个集群)? 现在,获得上面图解的代码是:

My question is how can I extract the coloured subtrees (each one represent a cluster) in a nice format, say SIF format ? Now the code to get the plot above is:

import scipy
import scipy.cluster.hierarchy as sch
import matplotlib.pylab as plt

scipy.randn(100,2)

d = sch.distance.pdist(X)

Z= sch.linkage(d,method='complete')

P =sch.dendrogram(Z)

plt.savefig('plot_dendrogram.png')

T = sch.fcluster(Z, 0.5*d.max(), 'distance')
#array([4, 5, 3, 2, 2, 3, 5, 2, 2, 5, 2, 2, 2, 3, 2, 3, 2, 5, 4, 5, 2, 5, 2,
#       3, 3, 3, 1, 3, 4, 2, 2, 4, 2, 4, 3, 3, 2, 5, 5, 5, 3, 2, 2, 2, 5, 4,
#       2, 4, 2, 2, 5, 5, 1, 2, 3, 2, 2, 5, 4, 2, 5, 4, 3, 5, 4, 4, 2, 2, 2,
#       4, 2, 5, 2, 2, 3, 3, 2, 4, 5, 3, 4, 4, 2, 1, 5, 4, 2, 2, 5, 5, 2, 2,
#       5, 5, 5, 4, 3, 3, 2, 4], dtype=int32)

sch.leaders(Z,T)
# (array([190, 191, 182, 193, 194], dtype=int32),
#  array([2, 3, 1, 4,5],dtype=int32))

因此,现在,fcluster()的输出给出了节点的集群(通过其ID),而leaders()描述了

So now, the output of fcluster() gives the clustering of the nodes (by their id's), and leaders() described here is supposed to return 2 arrays:

  • 第一个包含Z生成的簇的前导节点,在这里我们可以看到有5个簇以及图中的

  • first one contains the leader nodes of the clusters generated by Z, here we can see we have 5 clusters, as well as in the plot

,第二个是这些集群的ID

and the second one the id's of these clusters

因此,如果此Leader()返回resp. L和M:L[2]=182M[2]=1,则群集1由节点ID 182引导,该节点ID在观察集X中不存在,文档说"...然后它对应于一个非单例群集" .但我不明白...

So if this leaders() returns resp. L and M : L[2]=182 and M[2]=1, then cluster 1 is leaded by node id 182, which doesn't exist in the observations set X, the documentation says "... then it corresponds to a non-singleton cluster". But I can't get it ...

此外,我通过sch.to_tree(Z)将Z转换为树,该树将返回一个易于使用的树对象,我想对其进行可视化,但是我应该使用哪种工具作为可操纵此类对象的图形平台树对象作为输入?

Also, I converted the Z to a tree by sch.to_tree(Z), that will return an easy-to-use tree object, which I want to visualize, but which tool should I use as a graphical platform that manipulate these kind of tree objects as inputs?

推荐答案

回答有关树操作的部分问题...

Answering the part of your question regarding tree manipulation...

另一个答案中所述,您可以从中读取icoorddcoord的分支的坐标树对象.对于每个分支,从左到右给出协调.

As explained in aother answer, you can read the coordinates of the branches reading icoord and dcoord from the tree object. For each branch the coordinated are given from the left to the right.

如果要手动绘制树,则可以使用类似以下内容的

If you want to manually plot the tree you can use something like:

def plot_tree(P, pos=None):
    plt.clf()
    icoord = scipy.array(P['icoord'])
    dcoord = scipy.array(P['dcoord'])
    color_list = scipy.array(P['color_list'])
    xmin, xmax = icoord.min(), icoord.max()
    ymin, ymax = dcoord.min(), dcoord.max()
    if pos:
        icoord = icoord[pos]
        dcoord = dcoord[pos]
        color_list = color_list[pos]
    for xs, ys, color in zip(icoord, dcoord, color_list):
        plt.plot(xs, ys, color)
    plt.xlim(xmin-10, xmax + 0.1*abs(xmax))
    plt.ylim(ymin, ymax + 0.1*abs(ymax))
    plt.show()

在您的代码中,plot_tree(P)给出:

该功能允许您仅选择一些分支:

The function allows you to select just some branches:

plot_tree(P, range(10))

现在,您必须知道要绘制的分支.也许fcluster()输出有点晦涩难懂,而另一种基于最小和最大距离公差找到要绘制的分支的另一种方法是直接使用linkage()的输出(在OP的情况下为Z):

Now you have to know which branches to plot. Maybe the fcluster() output is a little obscure and another way to find which branches to plot based on a minimum and a maximum distance tolerance would be using the output of linkage() directly (Z in the OP's case):

dmin = 0.2
dmax = 0.3
pos = scipy.all( (Z[:,2] >= dmin, Z[:,2] <= dmax), axis=0 ).nonzero()
plot_tree( P, pos )

推荐参考文献:

  • How does condensed distance matrix work? (pdist)
  • how to plot and annotate hierarchical clustering dendrograms in scipy/matplotlib

这篇关于我如何获得scipy.cluster.hierarchy制作的树状图的子树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆