计算树状图叶的排序 [英] Calculate ordering of dendrogram leaves

查看:123
本文介绍了计算树状图叶的排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有五个要点,我需要根据这些要点创建树状图。可以使用树状图功能来查找这些点的顺序,如下所示。但是,我不想使用树状图,因为它速度慢并且会导致大量点错误(我在这里问这个问题找到树状图的Python替代方法)。有人可以指出我如何将链接输出(Z)转换为树状图(Z)[’ivl]值。

I have five points and I need to create dendrogram from these. The function 'dendrogram' can be used to find the ordering of these points as shown below. However, I do not want to use dendrogram as it is slow and result in error for large number of points (I asked this question here Python alternate way to find dendrogram). Can someone points me how to convert the 'linkage' output (Z) to the "dendrogram(Z)['ivl']" value.

>>> from hcluster import pdist, linkage, dendrogram
>>> import numpy
>>> from numpy.random import rand
>>> x = rand(5,3)
>>> Y = pdist(x)
>>> Z = linkage(Y)
>>> Z
array([[ 1.        ,  3.        ,  0.11443378,  2.        ],
       [ 0.        ,  4.        ,  0.47941843,  2.        ],
       [ 5.        ,  6.        ,  0.67596472,  4.        ],
       [ 2.        ,  7.        ,  0.79993986,  5.        ]])
>>> 



>>> dendrogram(Z)['ivl']
['2', '1', '3', '0', '4']
>>> 


推荐答案

为什么慢?当然,计算链接聚类的天真的方法是 O(n ^ 3),但是对于 n = 5 价格便宜...

Why is it slow? Sure, the naive way of computing linkage clustering is O(n^3) but for n=5 this is as cheap as it gets...

有关scipy链接矩阵的格式,请参见以下问题:
scipy链接格式

For the format of the scipy linkage matrix, see this question: scipy linkage format

请注意,您可能仍需要对数据进行最佳排序。上面的链接矩阵编码给出

Note that you may still need to sort the data optimally. The linkage matrix encoding above gives


  • 元素1和聚类3在高度0.1144处连接(进入2元素聚类#5)

  • 元素0和群集4在高度0.7999处连接(进入2个元素群集,#6)

  • 群集5和群集6在高度0.6759处连接(进入a 4元素群集#7)

  • 元素2和群集7在高度0.7999处接合(进入5元素群集#8)

  • Element 1 and Cluster 3 join at height 0.1144 (into a 2 element cluster, #5)
  • Element 0 and Cluster 4 join at height 0.7999 (into a 2 element cluster, #6)
  • Cluster 5 and Cluster 6 join at height 0.6759 (into a 4 element cluster, #7)
  • Element 2 and Cluster 7 join at height 0.7999 (into a 5 element cluster, #8)

,但是它可能是通过链接距离来排序的,而不是以一维的顺序进行可视化处理(因为不是每个使用链接聚类的人都想在之后进行树状图虚拟化)。但是无论如何,如果确实需要排序,则计算树状图的数量级应为 O(n log n),与实际聚类相比相当便宜。

but it might be ordered by linking distance, and not in a 1d ordering for visualization (because not everbody using linkage clustering will want to run dendrogram viusalization afterwards). But in any way, computing the dendrogram should be on the order of O(n log n) if you do need to sort, fairly cheap compared to the actual clustering.

遵循这些原则应该可以解决问题:

Something along these lines should do the trick:

n = len(Z) + 1
cache = dict()
for k in range(len(Z)):
  c1, c2 = int(Z[k][0]), int(Z[k][1])
  c1 = [c1] if c1 < n else cache.pop(c1)
  c2 = [c2] if c2 < n else cache.pop(c2)
  cache[n+k] = c1 + c2
print cache[2*len(Z)]

这似乎是线性的,但是数组的预期大小为 log n ,因此取决于您的列表类型,它可能仍然是 O(n log n),而对于链接列表,它确实应该在 O(n)

This may appear to be linear, but the expected size of the arrays is log n, so depending on your list types it may still be O(n log n), while with linked lists it should indeed be doable in O(n).

但是最后,您可能想要避免分层聚类。这是聚类分析的流行介绍性示例,因为从概念上确实很容易理解。有一些非常棘手的算法(SLINK)可以将复杂度降低到 O(n ^ 2)。但是,有更多现代且功能强大的聚类算法具有较低的复杂度。实际上, OPTICS(维基百科)可以计算出非常相似的值(当您设置minPts = 2时),并且如果索引结构良好,它将在 O(n log n)中运行。另外,您可以增加minPts以获得更有意义的群集。 (但不要在Weka中使用OPTICS,也不要在浮动的python版本中使用它-否则它们都是不完整的或有故障的!)

But in the end, you might want to avoid hierarchical clustering. It is a popular introductory example to cluster analysis, because it is really easy to understand conceptually. There are some quite tricky algorithms (SLINK) to get it down to O(n^2) complexity. But there are more modern and powerful clustering algorithms that have lower complexity. Actually, OPTICS (Wikipedia) computes something quite similar (when you set minPts=2), and when you have a good index structure it will run in O(n log n). Plus you can increase minPts to get more meaningful clusters. (But do not use OPTICS in Weka, or that python version that is floating around - afaict they are both incomplete or buggy!)

这篇关于计算树状图叶的排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆