Spectral在python中对图形进行聚类 [英] Spectral Clustering a graph in python

查看:241
本文介绍了Spectral在python中对图形进行聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用python聚类图谱使用谱聚类。

频谱聚类是一种更通用的技术,不仅可以应用于图形,还可以应用于图像或任何类型的数据,但是,它被认为是一种特殊的聚类技术。可悲的是,我找不到python在线的谱聚类图的例子。



我很想知道如何解决这个问题。如果有人能帮我弄明白,我可以将文档添加到scikit学习中。



注释:




  • 一个很像这样的问题其中一个已经在本网站上被询问

    解决方案

    光谱聚类,然后按照文档(跳到结果!):

    代码:



      import numpy as np 
    从sklearn.cluster导入networkx作为nx
    导入SpectralClustering $ b $ from sklearn导入指标
    np.random.seed (1)

    #获取您提及的图形
    G = nx.karate_club_graph()

    #获取ground-truth:club-labels - >转换为0/1 np-array
    #(可能过度复杂的networkx用法在这里)
    gt_dict = nx.get_node_attributes(G,'club')
    gt = [gt_dict [i] for i in G.nodes()]
    gt = np.array([0 if i =='Mr. Hi'else 1 for i in gt])

    #将adjacency-matrix作为numpy -array
    adj_mat = nx.to_numpy_matrix(G)

    print('ground truth')
    print(gt)

    #Cluster
    sc = SpectralClustering(2,affinity ='precomputed',n_init = 100)
    sc.fit(adj_mat)

    #比较地面真值和聚类结果
    print 'spectral clustering')
    print(sc.labels_)
    print('just for better-visualization:invert clusters(permutation)')
    print(np.abs(sc.labels_ - 1 ))

    #计算一些聚类指标
    print(metrics.adjusted_rand_score(gt,sc.labels_))
    print(metrics.adjusted_mutual_info_score(gt,sc.labels_))



    输出:



     地面实况
    [0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1]
    谱聚类
    [1 1 0 1 1 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    仅用于更好的可视化:反转聚类(置换)
    [0 0 1 0 0 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
    0.204094758281
    0.271689477828



    总体思路:



    介绍来自这里


    图中的节点代表大学空手道俱乐部的34名成员。 (Zachary是一位社会学家,他是其中一员。)两个节点之间的边缘表明这两个成员在正常的俱乐部会议之外花费了大量的时间。这个数据集很有趣,因为在扎卡里收集他的数据时,空手道俱乐部发生了争执,并分成两派:一派领导人是先生。并由约翰A领导。事实证明,只使用连接信息(边缘),可以恢复两个派系。


    使用sklearn&谱聚类来解决这个问题:


    如果亲和度是图的邻接矩阵,则可以使用此方法查找归一化图切割。


    这个描述了标准化的图形切割:


    找到顶点V的两个不相交的分区A和B给定两个顶点之间的相似性度量w(i,j)(例如,一个图),给出
    ,A∪B= V且A∩B=∅B
    $ b < (A,B)= SUM u在A中,v在B中:w(u,v) cut / b>

    ...



    我们寻求A和B组之间的解离
    最小化,最大化每个组中的

    听起来没错。因此,我们创建了邻接矩阵( nx.to_numpy_matrix(G)),并将param affinity 设置为预先计算(因为我们的相关性矩阵是我们预计算的相似性度量)。


    或者,使用预先计算的用户提供的亲和力可以使用矩阵。


    编辑:虽然不熟悉这一点,但我寻找调谐并找到 assign_labels


    用于在嵌入空间中分配标签的策略。在laplacian嵌入后有两种方法分配标签。 k-means可以被应用并且是一种流行的选择。但它也可能对初始化很敏感。离散化是另一种对随机初始化不那么敏感的方法。


    所以尝试不太敏感的方法:

      sc = SpectralClustering(2,affinity ='precomputed',n_init = 100,assign_labels ='discretize')

    输出:

     地面实况
    [ 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]
    谱聚类
    [0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]
    仅用于更好的可视化:反转聚类(置换)
    [1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
    0.771725032425
    0.722546051351

    这完全符合实际情况!


    I'd like to cluster a graph in python using spectral clustering.

    Spectral clustering is a more general technique which can be applied not only to graphs, but also images, or any sort of data, however, it's considered an exceptional graph clustering technique. Sadly, I can't find examples of spectral clustering graphs in python online.

    I'd love some direction in how to go about this. If someone can help me figure it out, I can add the documentation to scikit learn.

    Notes:

    解决方案

    Without much experience with Spectral-clustering and just going by the docs (skip to the end for the results!):

    Code:

    import numpy as np
    import networkx as nx
    from sklearn.cluster import SpectralClustering
    from sklearn import metrics
    np.random.seed(1)
    
    # Get your mentioned graph
    G = nx.karate_club_graph()
    
    # Get ground-truth: club-labels -> transform to 0/1 np-array
    #     (possible overcomplicated networkx usage here)
    gt_dict = nx.get_node_attributes(G, 'club')
    gt = [gt_dict[i] for i in G.nodes()]
    gt = np.array([0 if i == 'Mr. Hi' else 1 for i in gt])
    
    # Get adjacency-matrix as numpy-array
    adj_mat = nx.to_numpy_matrix(G)
    
    print('ground truth')
    print(gt)
    
    # Cluster
    sc = SpectralClustering(2, affinity='precomputed', n_init=100)
    sc.fit(adj_mat)
    
    # Compare ground-truth and clustering-results
    print('spectral clustering')
    print(sc.labels_)
    print('just for better-visualization: invert clusters (permutation)')
    print(np.abs(sc.labels_ - 1))
    
    # Calculate some clustering metrics
    print(metrics.adjusted_rand_score(gt, sc.labels_))
    print(metrics.adjusted_mutual_info_score(gt, sc.labels_))
    

    Output:

    ground truth
    [0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]
    spectral clustering
    [1 1 0 1 1 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
    just for better-visualization: invert clusters (permutation)
    [0 0 1 0 0 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
    0.204094758281
    0.271689477828
    

    The general idea:

    Introduction on the data and task from here:

    The nodes in the graph represent the 34 members in a college Karate club. (Zachary is a sociologist, and he was one of the members.) An edge between two nodes indicates that the two members spent significant time together outside normal club meetings. The dataset is interesting because while Zachary was collecting his data, there was a dispute in the Karate club, and it split into two factions: one led by "Mr. Hi", and one led by "John A". It turns out that using only the connectivity information (the edges), it is possible to recover the two factions.

    Using sklearn & spectral-clustering to tackle this:

    If affinity is the adjacency matrix of a graph, this method can be used to find normalized graph cuts.

    This describes normalized graph cuts as:

    Find two disjoint partitions A and B of the vertices V of a graph, so that A ∪ B = V and A ∩ B = ∅

    Given a similarity measure w(i,j) between two vertices (e.g. identity when they are connected) a cut value (and its normalized version) is defined as: cut(A, B) = SUM u in A, v in B: w(u, v)

    ...

    we seek the minimization of disassociation between the groups A and B and the maximization of the association within each group

    Sounds alright. So we create the adjacency matrix (nx.to_numpy_matrix(G)) and set the param affinity to precomputed (as our adjancency-matrix is our precomputed similarity-measure).

    Alternatively, using precomputed, a user-provided affinity matrix can be used.

    Edit: While unfamiliar with this, i looked for parameters to tune and found assign_labels:

    The strategy to use to assign labels in the embedding space. There are two ways to assign labels after the laplacian embedding. k-means can be applied and is a popular choice. But it can also be sensitive to initialization. Discretization is another approach which is less sensitive to random initialization.

    So trying the less sensitive approach:

    sc = SpectralClustering(2, affinity='precomputed', n_init=100, assign_labels='discretize')
    

    Output:

    ground truth
    [0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]
    spectral clustering
    [0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]
    just for better-visualization: invert clusters (permutation)
    [1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
    0.771725032425
    0.722546051351
    

    That's a pretty much perfect fit to the ground-truth!

    这篇关于Spectral在python中对图形进行聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆