图中具有1亿个节点的连接组件 [英] Connected components in a graph with 100 million nodes
问题描述
我正在尝试在具有1亿个节点的图形中获取连接的组件的列表.对于较小的图,我通常使用 connected_components 函数正是这样做的.但是,使用此模块将具有1亿个节点(及其边缘)的图形加载到内存中将需要大约. 110GB的内存,我没有.一种替代方法是使用具有连接的组件功能的图形数据库,但我在Python中没有找到任何数据库.似乎Dex(API:Java,.NET,C ++)具有此功能,但我不确定100%.理想情况下,我正在寻找Python中的解决方案.非常感谢.
I am trying to get the list of connected components in a graph with 100 million nodes. For smaller graphs, I usually use the connected_components function of the Networkx module in Python which does exactly that. However, loading a graph with 100 million nodes (and their edges) into memory with this module would require ca. 110GB of memory, which I don't have. An alternative would be to use a graph database which has a connected components function but I haven't found any in Python. It would seem that Dex (API: Java, .NET, C++) has this functionality but I'm not 100% sure. Ideally I'm looking for a solution in Python. Many thanks.
推荐答案
SciPy具有稀疏矩阵格式之一来输入图的邻接矩阵.有针对性和无针对性的案例.
SciPy has a connected components algorithm. It expects as input the adjacency matrix of your graph in one of its sparse matrix formats and handles both the directed and undirected cases.
根据(i, j)
对adj_list
的序列构建稀疏邻接矩阵,其中i
和j
是节点(从零开始)的索引,可以通过以下方式完成
Building a sparse adjacency matrix from a sequence of (i, j)
pairs adj_list
where i
and j
are (zero-based) indices of nodes can be done with
i_indices, j_indices = zip(*adj_list)
adj_matrix = scipy.sparse.coo_matrix((np.ones(number_of_nodes),
(i_indices, j_indices)))
对于非定向案例,您将不得不做一些额外的工作.
You'll have to do some extra work for the undirected case.
如果您的图形足够稀疏,则此方法应该有效.
This approach should be efficient if your graph is sparse enough.
这篇关于图中具有1亿个节点的连接组件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!