如何在scipy分层聚类中获取非单聚类ID [英] How to get non-singleton cluster ids in scipy hierachical clustering

查看:87
本文介绍了如何在scipy分层聚类中获取非单聚类ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据

According to this we can get labels for non-singleton clusters.

我用一个简单的例子尝试了这个。

I tried this with a simple example.

import numpy as np
import scipy.cluster.hierarchy
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage

mat = np.array([[ 0. , 1. , 3.  ,0. ,2.  ,3.  ,1.],
 [ 1. , 0. , 3. , 1.,  1. , 2. , 2.],
 [ 3.,  3. , 0.,  3. , 3.,  3. , 4.],
 [ 0. , 1. , 3.,  0. , 2. , 3.,  1.],
 [ 2. , 1.,  3. , 2.,  0. , 1.,  3.],
 [ 3. , 2.,  3. , 3. , 1. , 0. , 3.],
 [ 1. , 2.,  4. , 1. , 3.,  3. , 0.]])

def llf(id):
    if id < n:
        return str(id)
    else:
        return '[%d %d %1.2f]' % (id, count, R[n-id,3])


linkage_matrix = linkage(mat, "complete")

dendrogram(linkage_matrix,
           p=4,
           leaf_label_func=llf,
           color_threshold=1,
           truncate_mode='lastp',
           distance_sort='ascending')

plt.show()

什么是n,并在这里计数?在如下图中,我需要知道在(3)和(2)下列出了谁?

What are n, and count here?In a diagram like following I need to know who are listed under(3) and (2)?

推荐答案

我认为该文档在这一部分上还不太清楚,并且其中的示例代码甚至无法运行。但是很明显,1表示第二个观测值,而(3)表示该节点中有3个观测值。

I think the document is not very clear at this part and the sample code in it is not even operational. But it is clear that 1 means the 2nd observation and (3) means there are 3 observation in that node.

如果您想知道3个观测点是什么。在第二个节点中,如果这是您的问题:

If you want to know what are the 3 obs. in the 2nd node, if that is your question:

In [51]:
D4=dendrogram(linkage_matrix,
              color_threshold=1,
              p=4,
              truncate_mode='lastp',
              distance_sort='ascending')
D7=dendrogram(linkage_matrix,
              color_list=['g',]*7,
              p=7,
              truncate_mode='lastp',
              distance_sort='ascending', no_plot=True)  
from itertools import groupby
[list(group) for key, group in groupby(D7['ivl'],lambda x: x in D4['ivl'])]
Out[51]:
[['1'], ['6', '0', '3'], ['2'], ['4', '5']]

第二个节点包含obs。第7、1和4,以及第2节点包含第5和第6个观测值。

The 2nd node contains obs. 7th, 1th and 4th, and the 2th node contains the 5th and the 6th observations.

这篇关于如何在scipy分层聚类中获取非单聚类ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆