scipy.cluster.hierarchy 教程 [英] Tutorial for scipy.cluster.hierarchy

查看：30 发布时间：2021/12/31 12:01:25 python scipy hierarchical-clustering

本文介绍了scipy.cluster.hierarchy 教程的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图了解如何操作层次结构集群，但文档太......技术性?......我无法理解它是如何工作的.

I'm trying to understand how to manipulate a hierarchy cluster but the documentation is too ... technical?... and I can't understand how it works.

是否有任何教程可以帮助我开始，逐步解释一些简单的任务?

Is there any tutorial that can help me to start with, explaining step by step some simple tasks?

假设我有以下数据集:

a = np.array([[0,   0  ],
              [1,   0  ],
              [0,   1  ],
              [1,   1  ], 
              [0.5, 0  ],
              [0,   0.5],
              [0.5, 0.5],
              [2,   2  ],
              [2,   3  ],
              [3,   2  ],
              [3,   3  ]])

我可以轻松地进行层次聚类并绘制树状图:

I can easily do the hierarchy cluster and plot the dendrogram:

z = linkage(a)
d = dendrogram(z)

现在，我该如何恢复特定的集群?假设在树状图中具有 [0,1,2,4,5,6] 元素的那个?
如何取回这些元素的值?

推荐答案

层次凝聚聚类 (HAC) 分为三个步骤:

There are three steps in hierarchical agglomerative clustering (HAC):

量化数据(metric 参数)
集群数据(method 参数)
选择簇数

做

z = linkage(a)

将完成前两个步骤.由于您没有指定任何参数，它使用标准值

will accomplish the first two steps. Since you did not specify any parameters it uses the standard values

metric = '欧几里得'
method = 'single'

所以 z = links(a) 会给你一个 a 的单一链接层次凝聚聚类.这种聚类是一种解决方案的层次结构.从这个层次结构中，您可以获得有关数据结构的一些信息.你现在可以做的是:

So z = linkage(a) will give you a single linked hierachical agglomerative clustering of a. This clustering is kind of a hierarchy of solutions. From this hierarchy you get some information about the structure of your data. What you might do now is:

检查哪个 metric 是合适的，例如.G.cityblock 或 chebychev 将以不同的方式量化您的数据(cityblock、euclidean 和 chebychev 对应到 L1、L2 和 L_inf 范数)
检查methdos 的不同属性/行为(例如single、complete 和average)
检查如何确定集群的数量，例如.G.通过阅读有关它的维基
计算找到的解决方案(聚类)的索引，例如剪影系数(使用该系数，您可以获得关于点/观测值与聚类分配的聚类的匹配程度的反馈).不同的索引使用不同的标准来限定聚类.

Check which metric is appropriate, e. g. cityblock or chebychev will quantify your data differently (cityblock, euclidean and chebychev correspond to L1, L2, and L_inf norm)
Check the different properties / behaviours of the methdos (e. g. single, complete and average)
Check how to determine the number of clusters, e. g. by reading the wiki about it
Compute indices on the found solutions (clusterings) such as the silhouette coefficient (with this coefficient you get a feedback on the quality of how good a point/observation fits to the cluster it is assigned to by the clustering). Different indices use different criteria to qualify a clustering.

从这里开始

import numpy as np
import scipy.cluster.hierarchy as hac
import matplotlib.pyplot as plt


a = np.array([[0.1,   2.5],
              [1.5,   .4 ],
              [0.3,   1  ],
              [1  ,   .8 ],
              [0.5,   0  ],
              [0  ,   0.5],
              [0.5,   0.5],
              [2.7,   2  ],
              [2.2,   3.1],
              [3  ,   2  ],
              [3.2,   1.3]])

fig, axes23 = plt.subplots(2, 3)

for method, axes in zip(['single', 'complete'], axes23):
    z = hac.linkage(a, method=method)

    # Plotting
    axes[0].plot(range(1, len(z)+1), z[::-1, 2])
    knee = np.diff(z[::-1, 2], 2)
    axes[0].plot(range(2, len(z)), knee)

    num_clust1 = knee.argmax() + 2
    knee[knee.argmax()] = 0
    num_clust2 = knee.argmax() + 2

    axes[0].text(num_clust1, z[::-1, 2][num_clust1-1], 'possible
<- knee point')

    part1 = hac.fcluster(z, num_clust1, 'maxclust')
    part2 = hac.fcluster(z, num_clust2, 'maxclust')

    clr = ['#2200CC' ,'#D9007E' ,'#FF6600' ,'#FFCC00' ,'#ACE600' ,'#0099CC' ,
    '#8900CC' ,'#FF0000' ,'#FF9900' ,'#FFFF00' ,'#00CC01' ,'#0055CC']

    for part, ax in zip([part1, part2], axes[1:]):
        for cluster in set(part):
            ax.scatter(a[part == cluster, 0], a[part == cluster, 1], 
                       color=clr[cluster])

    m = '
(method: {})'.format(method)
    plt.setp(axes[0], title='Screeplot{}'.format(m), xlabel='partition',
             ylabel='{}
cluster distance'.format(m))
    plt.setp(axes[1], title='{} Clusters'.format(num_clust1))
    plt.setp(axes[2], title='{} Clusters'.format(num_clust2))

plt.tight_layout()
plt.show()

给

这篇关于scipy.cluster.hierarchy 教程的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

scipy.cluster.hierarchy 教程 [英] Tutorial for scipy.cluster.hierarchy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

scipy.cluster.hierarchy 教程 [英] Tutorial for scipy.cluster.hierarchy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭