在NetworkX中合并两个加权图 [英] Combine two weighted graphs in NetworkX

查看:80
本文介绍了在NetworkX中合并两个加权图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python多重处理来创建多个不同的NetworkX图,然后使用下面的函数来组合图.但是,尽管此功能对于较小的图形来说效果很好,但对于较大的图形,它会使用大量内存,并且会挂在我的系统和大量内存的AWS系统上(仅使用系统总内存的三分之一).有没有更有效的方法来执行以下功能?

I'm using python multiprocessing to create multiple different NetworkX graphs and then using the below function to combine the graphs. However, while this function works fine for small graphs, for larger graphs it uses a huge amount of memory and hangs on both my system and a memory-heavy AWS system (using only about a third of the total memory in the system). Is there a more efficient way to perform the following function?

def combine_graphs(graph1, graph2, graph2_weight = 1):
    '''
    Given two graphs of different edge (but same node) structure (and the same type),
    combine the two graphs, summing all edge attributes and multiplying the second one's
    attributes by the desired weights. 

    E.g. if graph1.edge[a][b] = {'a': 1, 'b':2} and 
    graph2.edge[a][b] = {'a': 3, 'c': 4}, 
    with a weight of 1 the final graph edge should be 
    final_graph.edge[a][b] = {'a': 4, 'b': 2, 'c': 4} and with a weight 
    of .5 the final graph edge should be {'a': 2.5, 'b': 2, 'c': 2}.

    Inputs: Two graphs to be combined and a weight to give to the second graph
    '''

    if type(graph1) != type(graph2) or len(set(graph2.nodes()) - set(graph1.nodes())) > 0:
        raise Exception('Graphs must have the same type and graph 2 cannot have nodes that graph 1 does not have.')

    # make a copy of the new graph to ensure that it doesn't change
    new_graph = graph1.copy()

    # iterate over graph2's edges, adding them to graph1
    for node1, node2 in graph2.edges():
        # if that edge already exists, now iterate over the attributes
        if new_graph.has_edge(node1, node2):
            for attr in graph2.edge[node1][node2]:
                # if that attribute exists, sum the values, otherwise, simply copy attrs
                if new_graph.edge[node1][node2].get(attr) is not None:
                    # try adding weighted value: if it fails, it's probably not numeric so add the full value (the only other option is a list)
                    try:
                        new_graph.edge[node1][node2][attr] += graph2.edge[node1][node2][attr] * graph2_weight
                    except:
                        new_graph.edge[node1][node2][attr] += graph2.edge[node1][node2][attr]
                else:
                    try:
                        new_graph.edge[node1][node2][attr] = graph2.edge[node1][node2][attr] * graph2_weight
                    except:
                        new_graph.edge[node1][node2][attr] = graph2.edge[node1][node2][attr]

        # otherwise, add the new edge with all its atributes -- first, iterate through those attributes to weight them
        else:
            attr_dict = graph2.edge[node1][node2]
            for item in attr_dict:
                try:
                    attr_dict[item] = attr_dict[item] * graph2_weight
                except:
                    continue
            new_graph.add_edge(node1, node2, attr_dict = attr_dict)

    return new_graph

推荐答案

代码中有两个地方会扩展内存:

There are two places where memory will expand in your code:

1)制作graph1的副本(虽然您可能需要保留一份副本)

1) making a copy of graph1 (perhaps you need to keep a copy though)

2)使用graph2.edges()创建内存中所有边缘的列表,graph2.edges_iter()遍历边缘而不创建新列表

2) using graph2.edges() makes a list of all edges in memory, graph2.edges_iter() iterates over the edges without creating a new list

您也可以通过对边缘数据进行不同的处理来加快处理速度.您可以在遍历边缘的同时获取数据对象,而不必像字典查找那样执行:

You can probably make it faster by handling the edge data differently too. You can get the data object while iterating over the edges and not have to perform as may dictionary lookups:

def combined_graphs_edges(G, H, weight = 1.0):
    for u,v,hdata in H.edges_iter(data=True):
        # multply attributes of H by weight
        attr = dict( (key, value*weight) for key,value in hdata.items())
        # get data from G or use empty dict if no edge in G
        gdata = G[u].get(v,{})
        # add data from g
        # sum shared items
        shared = set(gdata) & set(hdata)
        attr.update(dict((key, attr[key] + gdata[key]) for key in shared))
        # non shared items
        non_shared = set(gdata) - set(hdata)
        attr.update(dict((key, gdata[key]) for key in non_shared))
        yield u,v,attr
    return


if __name__ == '__main__':
    import networkx as nx
    G = nx.Graph([('a','b', {'a': 1, 'b':2})])
    H = nx.Graph([('a','b', {'a': 3, 'c':4})])
    print list(combined_graphs_edges(G,H,weight=0.5))
    # or to make a new graph 
    graph = G.copy()
    graph.add_edges_from(combined_graphs_edges(G,H,weight=0.5))

这篇关于在NetworkX中合并两个加权图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆