Python中最有效的图数据结构是什么? [英] What is the most efficient graph data structure in Python?
问题描述
我需要能够在 python 中操作一个大的(10^7 个节点)图.每个节点/边对应的数据是最少的,比如少量的字符串.就内存和速度而言,最有效的方法是什么?
dict 的 dict 更灵活,更容易实现,但我直觉地希望列表列表更快.list 选项还要求我将数据与结构分开,而 dicts 将允许这样的东西:
graph[I][J]["Property"]="value"
你有什么建议?
<小时>是的,我应该更清楚我所说的效率是什么意思.在这种特殊情况下,我指的是随机访问检索.
将数据加载到内存中并不是什么大问题.这是一劳永逸的.耗时的部分是访问节点,以便我可以提取信息并衡量我感兴趣的指标.
我没有考虑让每个节点成为一个类(所有节点的属性都相同),但似乎这会增加额外的开销?我希望有人可以分享他们可以分享的类似案例的直接经验.毕竟,图是 CS 中最常见的抽象之一.
我强烈建议你看看
I need to be able to manipulate a large (10^7 nodes) graph in python. The data corresponding to each node/edge is minimal, say, a small number of strings. What is the most efficient, in terms of memory and speed, way of doing this?
A dict of dicts is more flexible and simpler to implement, but I intuitively expect a list of lists to be faster. The list option would also require that I keep the data separate from the structure, while dicts would allow for something of the sort:
graph[I][J]["Property"]="value"
What would you suggest?
Yes, I should have been a bit clearer on what I mean by efficiency. In this particular case I mean it in terms of random access retrieval.
Loading the data in to memory isn't a huge problem. That's done once and for all. The time consuming part is visiting the nodes so I can extract the information and measure the metrics I'm interested in.
I hadn't considered making each node a class (properties are the same for all nodes) but it seems like that would add an extra layer of overhead? I was hoping someone would have some direct experience with a similar case that they could share. After all, graphs are one of the most common abstractions in CS.
I would strongly advocate you look at NetworkX. It's a battle-tested war horse and the first tool most 'research' types reach for when they need to do analysis of network based data. I have manipulated graphs with 100s of thousands of edges without problem on a notebook. Its feature rich and very easy to use. You will find yourself focusing more on the problem at hand rather than the details in the underlying implementation.
Example of Erdős-Rényi random graph generation and analysis
"""
Create an G{n,m} random graph with n nodes and m edges
and report some properties.
This graph is sometimes called the Erd##[m~Qs-Rényi graph
but is different from G{n,p} or binomial_graph which is also
sometimes called the Erd##[m~Qs-Rényi graph.
"""
__author__ = """Aric Hagberg (hagberg@lanl.gov)"""
__credits__ = """"""
# Copyright (C) 2004-2006 by
# Aric Hagberg
# Dan Schult
# Pieter Swart
# Distributed under the terms of the GNU Lesser General Public License
# http://www.gnu.org/copyleft/lesser.html
from networkx import *
import sys
n=10 # 10 nodes
m=20 # 20 edges
G=gnm_random_graph(n,m)
# some properties
print "node degree clustering"
for v in nodes(G):
print v,degree(G,v),clustering(G,v)
# print the adjacency list to terminal
write_adjlist(G,sys.stdout)
Visualizations are also straightforward:
More visualization: http://jonschull.blogspot.com/2008/08/graph-visualization.html
这篇关于Python中最有效的图数据结构是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!