在sklearn的聚集聚类中提取从根到叶的路径 [英] Extract path from root to leaf in sklearn's agglomerative clustering

查看:72
本文介绍了在sklearn的聚集聚类中提取从根到叶的路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑到 sklearn.AgglomerativeClustering 创建的聚集群集的某些特定叶节点,我试图确定从根节点(所有数据点)到给定叶节点以及每个中间节点的路径步骤(树的内部节点)列出相应的数据点,请参见下面的示例.

Given some specific leaf node of the agglomerative clustering created by sklearn.AgglomerativeClustering, I am trying to identify the path from the root node (all data points) to the given leaf node and for each intermediate step (internal node of the tree) the list of corresponding data points, see the example below.

在此示例中,我考虑了五个数据点并将重点放在点3上,这样我就希望提取从根到叶3的每个步骤中考虑的实例,因此所需的结果将是[[1,2,3,4,5],[1,3,4,5],[3,4],[3]].我该如何使用sklearn来实现这一目标(或者如果使用其他库则无法实现这一目标)?

In this example, I consider five data points and focus on the point 3 in such a way that I want to extract the instances considered in each step starting at the root and ending at the leaf 3, so the desired result would be [[1,2,3,4,5],[1,3,4,5],[3,4],[3]]. How could I achieve this with sklearn (or if this is not possible with a different library)?

推荐答案

下面的代码首先找到焦点的所有祖先(使用下面的 find_ancestor 函数),然后查找并添加所有后代(find_descendent ).

Code below first find all ancestors of focus point (using find_ancestor function below), then finds and add all descendents (find_descendent) of each ancestor.

首次加载和训练数据:

iris = load_iris()
N = 10
x = iris.data[:N]
model = AgglomerativeClustering(compute_full_tree=True).fit(x)

这是主要代码:

ans = []
for a in find_ancestor(3)[::-1]:
    ans.append(find_descendent(a))
print(ans)

在我的情况下,哪个输出:

Which outputs in my case:

[[1, 9, 8, 6, 2, 3, 5, 7, 0, 4],
 [1, 9, 8, 6, 2, 3],
 [8, 6, 2, 3],
 [6, 2, 3],
 [2, 3],
 [3]]

要了解 find_ancestor 的代码,请记住,索引为 i 的非叶节点的2个子节点位于 model.children_ [i]

To understand code of find_ancestor, please remember that 2 childs of a non-leaf node with index i are at model.children_[i]

def find_ancestor(target):
    for ind,pair in enumerate(model.children_):
        if target in pair:
            return [target]+find_ancestor(N+ind)
    return [ind+N]

递归 find_descendent 使用 mem 将其输出保存在内存中,这样就不会不必要地对其进行重新计算.

The recursive find_descendent uses mem to keep it's output in memory so they don't get needlessly re-computed.

mem = {}
def find_descendent(node):
    global mem
    if node in mem: return mem[node]
    if node<N: return [node]
    pair = model.children_[node-N]
    mem[node] = find_descendent(pair[0])+find_descendent(pair[1])
    return mem[node]

这篇关于在sklearn的聚集聚类中提取从根到叶的路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆