Python:有向图中的所有简单路径 [英] Python: all simple paths in a directed graph

查看:49
本文介绍了Python:有向图中的所有简单路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理(多个)有向图,其中没有循环,我需要找到任意两个节点之间的所有简单路径.一般来说,我不会担心执行时间,但我必须在非常多的时间步中为非常多的节点执行此操作 - 我正在处理基于时间的模拟.

我过去曾尝试过 NetworkX 提供的工具,但总的来说,我发现它们比我的方法慢.不知道最近有没有变化.

我已经实现了这个递归函数:

导入时间def all_simple_paths(adjlist, start, end, path):路径 = 路径 + [开始]如果开始 == 结束:返回 [路径]路径 = []对于 adjlist[start] 中的孩子:如果孩子不在路径中:child_paths = all_simple_paths(adjlist, child, end, path)路径.扩展(child_paths)返回路径fid = open('digraph.txt', 'rt')adjlist = eval(fid.read().strip())数字 = 1000stmnt = 'all_simple_paths(adjlist, 166, 180, [])'setup = 'from __main__ import all_simple_paths, adjlist'elapsed = timeit.timeit(stmnt, setup=setup, number=number)/number打印 'Elapsed: %0.2f ms'%(1000*elapsed)

在我的计算机上,每次迭代平均需要 1.5 毫秒.我知道这是一个小数字,但我必须非常多次执行此操作.

如果你有兴趣,我在这里上传了一个包含邻接列表的小文件:

Python 有一个用于此任务的内置装饰器 lru_cache.它使用哈希来记住参数,因此您需要将 adjListpath 更改为 tuple 因为 list 不是可散列.

导入时间导入功能工具@functools.lru_cache()def all_simple_paths(adjlist, start, end, path):路径=路径+(开始,)如果开始 == 结束:返回 [路径]路径 = []对于 adjlist[start] 中的孩子:如果孩子不在路径中:child_paths = all_simple_paths(tuple(adjlist), child, end, path)路径.扩展(child_paths)返回路径fid = open('digraph.txt', 'rt')adjlist = eval(fid.read().strip())# 你也可以改变你在txt中的数据格式adjlist = tuple(tuple(pair)for pair in adjlist)数字 = 1000stmnt = 'all_simple_paths(adjlist, 166, 180, ())'setup = 'from __main__ import all_simple_paths, adjlist'elapsed = timeit.timeit(stmnt, setup=setup, number=number)/number打印('经过:%0.2f毫秒'%(1000 *经过))

在我的机器上运行时间:
- 原始:0.86ms
- 带缓存:0.01ms

这种方法应该只在有很多共享的子问题时才有效.

I am working with a (number of) directed graphs with no cycles in them, and I have the need to find all simple paths between any two nodes. In general I wouldn't worry about the execution time, but I have to do this for very many nodes during very many timesteps - I am dealing with a time-based simulation.

I had tried in the past the facilities offered by NetworkX but in general I found them slower than my approach. Not sure if anything has changed lately.

I have implemented this recursive function:

import timeit

def all_simple_paths(adjlist, start, end, path):

    path = path + [start]

    if start == end:
        return [path]

    paths = []

    for child in adjlist[start]:

        if child not in path:

            child_paths = all_simple_paths(adjlist, child, end, path)
            paths.extend(child_paths)

    return paths


fid = open('digraph.txt', 'rt')
adjlist = eval(fid.read().strip())

number = 1000
stmnt  = 'all_simple_paths(adjlist, 166, 180, [])'
setup  = 'from __main__ import all_simple_paths, adjlist'
elapsed = timeit.timeit(stmnt, setup=setup, number=number)/number
print 'Elapsed: %0.2f ms'%(1000*elapsed)

On my computer, I get an average of 1.5 ms per iteration. I know this is a small number, but I have to do this operation very many times.

In case you're interested, I have uploaded a small file containing the adjacency list here:

adjlist

I am using adjacency lists as inputs, coming from a NetworkX DiGraph representation.

Any suggestion for improvements of the algorithm (i.e., does it have to be recursive?) or other approaches I may try are more than welcome.

Thank you.

Andrea.

解决方案

You can save time without change the algorithm logic by caching result of shared sub-problems here.

For example, calling all_simple_paths(adjlist, 'A', 'D', []) in following graph will compute all_simple_paths(adjlist, 'D', 'E', []) multiple times:

Python has a built-in decorator lru_cache for this task. It uses hash to memorize the parameters so you will need to change adjList and path to tuple since list is not hashable.

import timeit
import functools

@functools.lru_cache()
def all_simple_paths(adjlist, start, end, path):

    path = path + (start,)

    if start == end:
        return [path]

    paths = []

    for child in adjlist[start]:

        if child not in path:

            child_paths = all_simple_paths(tuple(adjlist), child, end, path)
            paths.extend(child_paths)

    return paths


fid = open('digraph.txt', 'rt')
adjlist = eval(fid.read().strip())

# you can also change your data format in txt
adjlist = tuple(tuple(pair)for pair in adjlist)

number = 1000
stmnt  = 'all_simple_paths(adjlist, 166, 180, ())'
setup  = 'from __main__ import all_simple_paths, adjlist'
elapsed = timeit.timeit(stmnt, setup=setup, number=number)/number
print('Elapsed: %0.2f ms'%(1000*elapsed))

Running time on my machine:
- original: 0.86ms
- with cache: 0.01ms

And this method should only work when there's a lot shared sub-problems.

这篇关于Python:有向图中的所有简单路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆