图中具有1亿个节点的连接组件 [英] Connected components in a graph with 100 million nodes

查看:73
本文介绍了图中具有1亿个节点的连接组件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在具有1亿个节点的图形中获取连接的组件的列表.对于较小的图,我通常使用 connected_components 函数正是这样做的.但是,使用此模块将具有1亿个节点(及其边缘)的图形加载到内存中将需要大约. 110GB的内存,我没有.一种替代方法是使用具有连接的组件功能的图形数据库,但我在Python中没有找到任何数据库.似乎Dex(API:Java,.NET,C ++)具有此功能,但我不确定100%.理想情况下,我正在寻找Python中的解决方案.非常感谢.

I am trying to get the list of connected components in a graph with 100 million nodes. For smaller graphs, I usually use the connected_components function of the Networkx module in Python which does exactly that. However, loading a graph with 100 million nodes (and their edges) into memory with this module would require ca. 110GB of memory, which I don't have. An alternative would be to use a graph database which has a connected components function but I haven't found any in Python. It would seem that Dex (API: Java, .NET, C++) has this functionality but I'm not 100% sure. Ideally I'm looking for a solution in Python. Many thanks.

推荐答案

SciPy具有稀疏矩阵格式之一来输入图的邻接矩阵.有针对性和无针对性的案例.

SciPy has a connected components algorithm. It expects as input the adjacency matrix of your graph in one of its sparse matrix formats and handles both the directed and undirected cases.

根据(i, j)adj_list的序列构建稀疏邻接矩阵,其中ij是节点(从零开始)的索引,可以通过以下方式完成

Building a sparse adjacency matrix from a sequence of (i, j) pairs adj_list where i and j are (zero-based) indices of nodes can be done with

i_indices, j_indices = zip(*adj_list)
adj_matrix = scipy.sparse.coo_matrix((np.ones(number_of_nodes),
                                     (i_indices, j_indices)))

对于非定向案例,您将不得不做一些额外的工作.

You'll have to do some extra work for the undirected case.

如果您的图形足够稀疏,则此方法应该有效.

This approach should be efficient if your graph is sparse enough.

这篇关于图中具有1亿个节点的连接组件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆