绘制由 NetworkX Girvan-Newman 算法找到的社区的树状图 [英] Plot the dendrogram of communities found by NetworkX Girvan-Newman algorithm

查看:71
本文介绍了绘制由 NetworkX Girvan-Newman 算法找到的社区的树状图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

用于网络社区检测的 Girvan-Newman 算法:

<块引用>

通过逐步去除原始边缘来检测社区图形.传统上,该算法去除了最有价值"的优势在每个步骤中具有最高中间性中心的边缘.作为图分解成碎片,紧密结合的社区结构是暴露,结果可以描述为树状图.

在 NetworkX 中,实现返回集合元组上的迭代器.第一个元组是由 2 个社区组成的第一个切割,第二个元组是由 3 个社区组成的第二个切割,依此类推,直到最后一个元组具有 n 个独立节点(树状图的叶子)的 n 个集合.

import networkx as nxG = nx.path_graph(10)comp = nx.community.girvan_newman(G)列表(comp)

<块引用>

[({0,1,2,3,4},{5,6,7,8,9}),({0,1},{2,3,4},{5,6,7, 8,9}), ({0, 1}, {2, 3, 4}, {5, 6}, {8, 9, 7}), ({0, 1}, {2}, {3, 4}, {5,6}, {8, 9, 7}), ({0, 1}, {2}, {3, 4}, {5, 6}, {7}, {8, 9}), ({0}, {1},{2}, {3, 4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5, 6},{7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8, 9}), ({0},{1},{2},{3},{4},{5},{6},{7},{8},{9})]

问题是:如何绘制树状图?

我查看了

The Girvan-Newman algorithm for community detection in networks:

detects communities by progressively removing edges from the original graph. The algorithm removes the "most valuable" edge, traditionally the edge with the highest betweenness centrality, at each step. As the graph breaks down into pieces, the tightly knit community structure is exposed and the result can be depicted as a dendrogram.

In NetworkX the implementation returns an iterator over tuples of sets. First tuple is the first cut consisting of 2 communities, second tuple is the second cut consisting of 3 communities, etc., until the last tuple with n sets for n separate nodes (the leaves of the dendrogram).

import networkx as nx

G = nx.path_graph(10)
comp = nx.community.girvan_newman(G)
list(comp)

[({0, 1, 2, 3, 4}, {5, 6, 7, 8, 9}), ({0, 1}, {2, 3, 4}, {5, 6, 7, 8, 9}), ({0, 1}, {2, 3, 4}, {5, 6}, {8, 9, 7}), ({0, 1}, {2}, {3, 4}, {5, 6}, {8, 9, 7}), ({0, 1}, {2}, {3, 4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3, 4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}, {9})]

Question is: how to plot this dendrogram?

I've looked at scipy.cluster.hierarchy.dendrogram but it expects a "linkage matrix" I'm guessing such as the one created by scipy.cluster.hierarchy.linkage, but I'm not sure how I would convert this list of tuples into this "linkage matrix".

So I'm asking how to draw this dendrogram, with/without the help of SciPy's dendrogram.

解决方案

Following @ItamarMushkin I followed @mdml's answer with slight modifications and got what I wanted. At high level I'm turning NetworkX's Girvan-Newman iterator output into another DiGraph() I eventually want to see as a dendogram. Then I build Z, a linkage matrix I input to scipy.cluster.hierarchy.dendrogram, in the form of a edgelist that includes the actual height for each dendogram merge.

Two modifications I had to make to @mdml's answer:

  1. Not that important: I sort the tuple-keys of nodes entering index
  2. More important: I add a get_merge_height function, which gives for each merge its unique height according to Girvan-Newman output order of edges removal. Otherwise, all merges of two nodes would be the same height in the dendrogram, all merges in the next level of merging two nodes and another one would be the same height, etc.

I understand there may be some redundant iterations here, I haven't thought about optimization yet.

import networkx as nx
from itertools import chain, combinations
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram

# get simulated Graph() and Girvan-Newman communities list
G = nx.path_graph(10)
communities = list(nx.community.girvan_newman(G))

# building initial dict of node_id to each possible subset:
node_id = 0
init_node2community_dict = {node_id: communities[0][0].union(communities[0][1])}
for comm in communities:
    for subset in list(comm):
        if subset not in init_node2community_dict.values():
            node_id += 1
            init_node2community_dict[node_id] = subset

# turning this dictionary to the desired format in @mdml's answer
node_id_to_children = {e: [] for e in init_node2community_dict.keys()}
for node_id1, node_id2 in combinations(init_node2community_dict.keys(), 2):
    for node_id_parent, group in init_node2community_dict.items():
        if len(init_node2community_dict[node_id1].intersection(init_node2community_dict[node_id2])) == 0 and group == init_node2community_dict[node_id1].union(init_node2community_dict[node_id2]):
            node_id_to_children[node_id_parent].append(node_id1)
            node_id_to_children[node_id_parent].append(node_id2)

# also recording node_labels dict for the correct label for dendrogram leaves
node_labels = dict()
for node_id, group in init_node2community_dict.items():
    if len(group) == 1:
        node_labels[node_id] = list(group)[0]
    else:
        node_labels[node_id] = ''

# also needing a subset to rank dict to later know within all k-length merges which came first
subset_rank_dict = dict()
rank = 0
for e in communities[::-1]:
    for p in list(e):
        if tuple(p) not in subset_rank_dict:
            subset_rank_dict[tuple(sorted(p))] = rank
            rank += 1
subset_rank_dict[tuple(sorted(chain.from_iterable(communities[-1])))] = rank

# my function to get a merge height so that it is unique (probably not that efficient)
def get_merge_height(sub):
    sub_tuple = tuple(sorted([node_labels[i] for i in sub]))
    n = len(sub_tuple)
    other_same_len_merges = {k: v for k, v in subset_rank_dict.items() if len(k) == n}
    min_rank, max_rank = min(other_same_len_merges.values()), max(other_same_len_merges.values())
    range = (max_rank-min_rank) if max_rank > min_rank else 1
    return float(len(sub)) + 0.8 * (subset_rank_dict[sub_tuple] - min_rank) / range

# finally using @mdml's magic, slightly modified:
G           = nx.DiGraph(node_id_to_children)
nodes       = G.nodes()
leaves      = set( n for n in nodes if G.out_degree(n) == 0 )
inner_nodes = [ n for n in nodes if G.out_degree(n) > 0 ]

# Compute the size of each subtree
subtree = dict( (n, [n]) for n in leaves )
for u in inner_nodes:
    children = set()
    node_list = list(node_id_to_children[u])
    while len(node_list) > 0:
        v = node_list.pop(0)
        children.add( v )
        node_list += node_id_to_children[v]
    subtree[u] = sorted(children & leaves)

inner_nodes.sort(key=lambda n: len(subtree[n])) # <-- order inner nodes ascending by subtree size, root is last

# Construct the linkage matrix
leaves = sorted(leaves)
index  = dict( (tuple([n]), i) for i, n in enumerate(leaves) )
Z = []
k = len(leaves)
for i, n in enumerate(inner_nodes):
    children = node_id_to_children[n]
    x = children[0]
    for y in children[1:]:
        z = tuple(sorted(subtree[x] + subtree[y]))
        i, j = index[tuple(sorted(subtree[x]))], index[tuple(sorted(subtree[y]))]
        Z.append([i, j, get_merge_height(subtree[n]), len(z)]) # <-- float is required by the dendrogram function
        index[z] = k
        subtree[z] = list(z)
        x = z
        k += 1

# dendrogram
plt.figure()
dendrogram(Z, labels=[node_labels[node_id] for node_id in leaves])
plt.savefig('dendrogram.png')

这篇关于绘制由 NetworkX Girvan-Newman 算法找到的社区的树状图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆