有效生成具有用户指定的全局聚类系数的随机图 [英] Efficiently generating random graphs with a user-specified global clustering coefficient

查看:47
本文介绍了有效生成具有用户指定的全局聚类系数的随机图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究大型神经网络的仿真,为此,我需要生成代表网络拓扑的随机图.

我希望能够指定这些图的以下属性:

  • 节点数 N (〜= 1000-10000)
  • 任意两个给定节点之间的平均连接概率 p (〜0.01-0.2)
  • 全局聚类系数 C (〜0.1-0.5)

理想情况下,应从满足这些用户指定条件的所有可能图形的集合中均匀地绘制随机图形.

目前,我使用的是非常粗糙的随机扩散方法,首先从具有所需大小和全局连接概率的Erdos-Renyi随机网络开始,然后在每一步中,我随机重新布线边缘的一部分.如果重新布线使我更接近所需的 C ,那么我会将重新布线的网络保留到下一个迭代中.

这是我当前的Python实现:

 导入图形将numpy导入为npdef generate_fixed_gcc(n,p,target_gcc,tol = 1E-3):"创建大小为n的Erdos-Renyi随机图,并带有指定的整体连接概率p,然后将其迭代地重新布线,以便实现用户指定的全局聚类系数."#初始化随机图G_best = igraph.Graph.Erdos_Renyi(n = n,p = p,有向=真,循环=假)loss_best = 1.n_edges = G_best.ecount()#从高重新布线率开始rewiring_rate = n_edgesn_iter = 0而loss_best>收费:#对当前最佳图形的副本进行操作G = G_best.copy()#根据电流调整要重新接线的连接数#最好的损失n_rewire = min(max(int(rewiring_rate * loss_best),1),n_edges)G.rewire(n = n_rewire)#计算全局聚类系数gcc = G.transitivity_undirected()损失= abs(gcc-target_gcc)#我们有所改善吗?如果损失<loss_best:#保留新图G_best = Gloss_best =损失gcc_best = gcc#提高接线率rewiring_rate * = 1.1别的:#降低重新接线率rewiring_rate * = 0.9n_iter + = 1#获取邻接矩阵作为布尔numpy数组M = np.array(G_best.get_adjacency().data,dtype = np.bool)返回M,n_iter,gcc_best 

这对于小型网络( N <500)是可行的,但是随着节点数量的增加,它很快变得难以处理.生成200个节点图大约需要20秒,而生成1000个节点图则需要几天的时间.

有人可以建议一种有效的方法吗?

解决方案

经过一些阅读,似乎最好的解决方案可能是,但是由于种种原因,我最终编写了自己的版本,您可以在这里找到.

性能

与我的幼稚扩散方法相比,Bansal方法只需要一半以上的迭代次数和一半的总时间​​:

我希望获得更大的收益,但是2倍的加速总比没有好.

泛化为有向图

Bansal方法面临的另一个挑战是我的图是有向的,而Bansal等人的算法仅设计用于无向图.对于有向图,不再保证重新布线步骤可以保留进度和出度顺序.


更新

我刚刚想出了如何推广Bansal方法以保留有向图的入度和出度序列的​​方法.技巧是选择要互换的两个向外边缘具有相反方向的图案({x,y1}和{x,y2}之间的边缘方向无关紧要):

我还做了一些其他的优化,并且性能似乎开始变得更加受人尊敬了-与扩散方法相比,它大约需要一半的迭代次数和一半的总时间​​.我已经使用新的时间更新了上面的图表.

I'm working on simulations of large-scale neuronal networks, for which I need to generate random graphs that represent the network topology.

I'd like to be able to specify the following properties of these graphs:

  • Number of nodes, N (~=1000-10000)
  • Average probability of a connection between any two given nodes, p (~0.01-0.2)
  • Global clustering coefficient, C (~0.1-0.5)

Ideally, the random graphs should be drawn uniformly from the set of all possible graphs that satisfy these user-specified criteria.

At the moment I'm using a very crude random diffusion approach where I start out with an Erdos-Renyi random network with the desired size and global connection probability, then on each step I randomly rewire some fraction of the edges. If the rewiring got me closer to the desired C then I keep the rewired network into the next iteration.

Here's my current Python implementation:

import igraph
import numpy as np

def generate_fixed_gcc(n, p, target_gcc, tol=1E-3):
    """
    Creates an Erdos-Renyi random graph of size n with a specified global
    connection probability p, which is then iteratively rewired in order to
    achieve a user- specified global clustering coefficient.
    """

    # initialize random graph
    G_best = igraph.Graph.Erdos_Renyi(n=n, p=p, directed=True, loops=False)

    loss_best = 1.
    n_edges = G_best.ecount()

    # start with a high rewiring rate
    rewiring_rate = n_edges
    n_iter = 0

    while loss_best > tol:

        # operate on a copy of the current best graph
        G = G_best.copy()

        # adjust the number of connections to rewire according to the current
        # best loss
        n_rewire = min(max(int(rewiring_rate * loss_best), 1), n_edges)
        G.rewire(n=n_rewire)

        # compute the global clustering coefficient
        gcc = G.transitivity_undirected()
        loss = abs(gcc - target_gcc)

        # did we improve?
        if loss < loss_best:

            # keep the new graph
            G_best = G
            loss_best = loss
            gcc_best = gcc

            # increase the rewiring rate
            rewiring_rate *= 1.1

        else:

            # reduce the rewiring rate
            rewiring_rate *= 0.9

        n_iter += 1

    # get adjacency matrix as a boolean numpy array
    M = np.array(G_best.get_adjacency().data, dtype=np.bool)

    return M, n_iter, gcc_best

This is works OK for small networks (N < 500), but it quickly becomes intractable as the number of nodes increases. It takes on the order of about 20 sec to generate a 200 node graph, and several days to generate a 1000 node graph.

Can anyone suggest an efficient way to do this?

解决方案

Having done a bit of reading, it looks as though the best solution might be the generalized version of Gleeson's algorithm presented in this paper. However, I still don't really understand how to implement it, so for the time being I've been working on Bansal et al's algorithm.

Like my naive approach, this is a Markov chain-based method that uses random edge swaps, but unlike mine it specifically targets 'triplet motifs' within the graph for rewiring:

Since this will have a greater tendency to introduce triangles, it will therefore have a greater impact on the clustering coefficient. At least in the case of undirected graphs, the rewiring step is also guaranteed to preserve the degree sequence. Again, on every rewiring iteration the new global clustering coefficient is measured, and the new graph is accepted if the GCC got closer to the target value.

Bansal et al actually provided a Python implementation, but for various reasons I ended up writing my own version, which you can find here.

Performance

The Bansal approach takes just over half the number of iterations and half the total time compared with my naive diffusion method:

I was hoping for bigger gains, but a 2x speedup is better than nothing.

Generalizing to directed graphs

One remaining challenge with the Bansal method is that my graphs are directed, whereas Bansal et al's algorithm is only designed to work on undirected graphs. With a directed graph, the rewiring step is no longer guaranteed to preserve the in- and out-degree sequences.


Update

I've just figured out how to generalize the Bansal method to preserve both the in- and out-degree sequences for directed graphs. The trick is to select motifs where the two outward edges to be swapped have opposite directions (the directions of the edges between {x, y1} and {x, y2} don't matter):

I've also made some more optimizations, and the performance is starting to look a bit more respectable - it takes roughly half the number of iterations and half the total time compared with the diffusion approach. I've updated the graphs above with the new timings.

这篇关于有效生成具有用户指定的全局聚类系数的随机图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆