NetworkX 有哪些可扩展性问题? [英] What scalability issues are associated with NetworkX?

查看：23 发布时间：2021/12/24 14:45:58 python graph social-networking libraries

本文介绍了NetworkX 有哪些可扩展性问题?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对具有数百万个节点和数千万条边的大型网络的网络分析感兴趣.我希望能够执行诸如从多种格式解析网络、查找连接组件、检测社区以及运行诸如 PageRank 之类的中心性度量之类的操作.

I'm interested in network analysis on large networks with millions of nodes and tens of millions of edges. I want to be able to do things like parse networks from many formats, find connected components, detect communities, and run centrality measures like PageRank.

我被 NetworkX 吸引是因为它有一个很好的 API、很好的文档，并且多年来一直在积极开发中.另外，因为它是用python编写的，所以开发起来应该很快.

I am attracted to NetworkX because it has a nice api, good documentation, and has been under active development for years. Plus because it is in python, it should be quick to develop with.

在最近的一次演示中(可在 github 此处上找到幻灯片)，声称:

In a recent presentation (the slides are available on github here), it was claimed that:

与许多其他工具不同，NX 旨在大规模处理数据与现代问题相关……NX 中的大多数核心算法都依赖于极快的遗留代码.

Unlike many other tools, NX is designed to handle data on a scale relevant to modern problems...Most of the core algorithms in NX rely on extremely fast legacy code.

演示文稿还指出，NetworkX 的基本算法是在 C/Fortran 中实现的.

The presentation also states that the base algorithms of NetworkX are implemented in C/Fortran.

不过看源码，貌似NetworkX大多是用python写的.我对源代码不太熟悉，但我知道几个例子，其中 NetworkX 使用 numpy 来做繁重的工作(反过来使用 C/Fortran 来做线性代数).例如，文件 networkx/networkx/algorithms/centrality/eigenvector.py 使用 numpy 来计算特征向量.

However, looking at the source code, it looks like NetworkX is mostly written in python. I am not too familiar with the source code, but I am aware of a couple of examples where NetworkX uses numpy to do heavy lifting (which in turn uses C/Fortran to do linear algebra). For example, the file networkx/networkx/algorithms/centrality/eigenvector.py uses numpy to calculate eigenvectors.

有谁知道这种调用像 numpy 这样的优化库的策略是否真的在整个 NetworkX 中普遍存在，或者是否只有少数算法这样做?任何人都可以描述与 NetworkX 相关的其他可扩展性问题吗?

Does anyone know if this strategy of calling an optimized library like numpy is really prevalent throughout NetworkX, or if just a few algorithms do it? Also can anyone describe other scalability issues associated with NetworkX?

来自 NetworkX 首席程序员的回复我在 NetworkX 邮件列表上提出了这个问题，Aric Hagberg 回复了:

Reply from NetworkX Lead Programmer I posed this question on the NetworkX mailing list, and Aric Hagberg replied:

NetworkX 中使用的数据结构适用于扩展到大问题(例如数据结构是一个邻接表).这算法具有各种缩放属性，但其中一些您提及是可用的(例如 PageRank、连接的组件、是线性的边数的复杂度).

The data structures used in NetworkX are appropriate for scaling to large problems (e.g. the data structure is an adjacency list). The algorithms have various scaling properties but some of the ones you mention are usable (e.g. PageRank, connected components, are linear complexity in the number of edges).

此时 NetworkX 是纯 Python 代码.邻接结构使用 Python 字典编码，提供了极大的灵活性以牺牲内存和计算速度为代价.大图将占用大量内存，最终会耗尽.

At this point NetworkX is pure Python code. The adjacency structure is encoded with Python dictionaries which provides great flexibility at the expense of memory and computational speed. Large graphs will take a lot of memory and you will eventually run out.

NetworkX 确实将 NumPy 和 SciPy 用于主要用于基于线性代数.在这种情况下，图形表示(复制)作为使用 NumPy 矩阵或 SciPy 的邻接矩阵稀疏矩阵.这些算法可以从遗留的 C 和在 NumPy 和 SciPY 中使用的 FORTRAN 代码.

NetworkX does use NumPy and SciPy for algorithms that are primarily based on linear algebra. In that case the graph is represented (copied) as an adjacency matrix using either NumPy matrices or SciPy sparse matrices. Those algorithms can benefit from the legacy C and FORTRAN code that is used under the hood in NumPy and SciPY.

NetworkX 有哪些可扩展性问题? [英] What scalability issues are associated with NetworkX?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

NetworkX 有哪些可扩展性问题? [英] What scalability issues are associated with NetworkX?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭