查找非常大的图的组成部分 [英] Finding components of very large graph

查看:79
本文介绍了查找非常大的图的组成部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的图形,它以大约1TB的大小的文本文件表示,每个边缘如下.

I have a very large graph represented in a text file of size about 1TB with each edge as follows.

From-node to-node

我想将其拆分为连接较弱的组件.如果较小,则可以将其加载到networkx并运行其组件查找算法.例如 http://networkx.github.io/documentation/latest/reference/generation/networkx.algorithms.components.connected.connected_components.html#networkx.algorithms.components.connected.connected_components

I would like to split it into its weakly connected components. If it was smaller I could load it into networkx and run their component finding algorithms. For example http://networkx.github.io/documentation/latest/reference/generated/networkx.algorithms.components.connected.connected_components.html#networkx.algorithms.components.connected.connected_components

有什么方法可以将整个内容都加载到内存中吗?

Is there any way to do this without loading the whole thing into memory?

推荐答案

如果甚至个节点的数量太大而无法容纳在内存中,您也可以进行分而治之,并使用外部内存进行排序您的大部分工作(例如Windows和Unix附带的 sort 命令可以对大于内存的文件进行排序):

If even the number of nodes is too large to fit in memory, you can divide and conquer and use external memory sorts to do most of the work for you (e.g. the sort command included with Windows and Unix can sort files much larger than memory):

  1. 选择一些阈值顶点k.
  2. 读取原始文件,并将其每个边缘写入3个文件之一:
    • 如果 a 的最大顶点数是<k
    • 如果
    • b b 的最小编号顶点是> = k
    • 否则为 c (即,如果它具有一个< k顶点和一个顶点> = k的顶点)
  1. Choose some threshold vertex k.
  2. Read the original file and write each of its edges to one of 3 files:
    • To a if its maximum-numbered vertex is < k
    • To b if its minimum-numbered vertex is >= k
    • To c otherwise (i.e. if it has one vertex < k and one vertex >= k)

上面使用的所有线性时间合并操作都可以直接从磁盘文件中读取,因为它们只以递增的顺序访问每个列表中的项目(即不需要缓慢的随机访问).

All the linear-time merge operations used above can read directly from disk files, since they only ever access items from each list in increasing order (i.e. no slow random access is needed).

这篇关于查找非常大的图的组成部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆