使用Python实现Kahn的拓扑排序算法 [英] Implement Kahn's topological sorting algorithm using Python

查看:201
本文介绍了使用Python实现Kahn的拓扑排序算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Kahn在62中提出了一种算法来对任何DAG(有向无环图)进行拓扑排序,从维基百科复制的伪代码:

  L←将包含排序元素的空列表
S←所有节点的集合没有传入边
,而S不是空do
从S
中删除一个节点n将n添加到L
的尾部,每个节点m的边e从n到m do
从图中删除边e#这是一个破坏性的步骤!
如果m没有其他输入边,那么
如果图有边,则将m插入到S中,然后
返回错误(图中至少有一个循环)else
return L(拓扑排序订单)

我需要使用IPython3来实现它,以下是DAG的实现:

  class节点(对象):
def __init __(self,name,parents):
assert isinstance(name, str)
assert all(isinstance(_,RandomVariable)for _ in parents)
self.name,self.parents = name,parents

其中 name 是节点的标签,父母存储其所有父节点。然后DAG类被实现为:

$ p $ class DAG(object):
def __init __(self,* nodes) :
在节点中断言全部(isinstance(_,Node))_
self.nodes =节点

(DAG实现是固定的,不会改进。)然后,我需要将Kahn算法实现为一个函数 top_order ,它接受一个DAG实例并返回一个像(node_1,node_2,...,node_n)这样的顺序。主要的问题是,这个算法是破坏性的,因为它的一个步骤是从图中删除边(第5行),这会删​​除 m.parents 。但是,我必须保持DAG实例不变



到目前为止,我可以想到的一种方法是创建一个 deep < (即使是浅拷贝也无法完成这项工作,因为算法仍然通过引用破坏原始实例),并且在这个拷贝上执行破坏性算法,然后得到正确的节点排序此副本的名称(假设节点之间没有命名冲突),然后使用此名称顺序来推断原始实例的节点的正确排序,大致如下所示:

  def top_order(网络):
'''接受DAG,打印并返回拓扑排序。'''
断言类型)== DAG
temp = copy.deepcopy(network)#将原始实例保持原样

ordering_name = []
roots = [node for temp.nodes if不是node.parents]
,而root:
n_node = roots [0]
del根[0]
orders_name.append(n_node.name)
在temp.nodes中的m_node:
如果n_node在m_node.parents中:
temp_list = list(m_node.parents )
temp_list.remove(n_node)
m_node.parents =元组(temp_list)
如果不是m_node.parents:
roots.append(m_node)

print(orders_name)#打印按名称排序

#获取原始实例节点的顺序
orders = []
在orders_name中的名称:
为节点在network.nodes中:
if node.name == name:
ordering.append(node)

返回元组(订购)

两个问题:首先,当 network 巨大时,深度复制将耗费资源;第二,我希望改进我的嵌套 for 循环,以获得原始实例的排序。 (对于第二种情况,我认为类似于 sorted 方法等东西会出现在我脑海中。)



任何建议? / p>

解决方案

我将会建议一个不太符合实际的算法:您根本不需要操作DAG,您只需要操作有关DAG的info 。算法需要的唯一有趣的东西是从节点到其子节点的映射(与您的DAG实际存储的内容相反)以及每个节点父节点的数目。

这些都很容易计算,并且可以使用字典将此信息与节点名称相关联(假设所有名称都不相同 - 如果不是,则可以使用更多代码创建唯一名称)。 b
$ b

然后这应该工作:

  def topsort(dag):
name2node = {node.name:节点在dag.nodes中的节点}
#映射名称到前辈(父母)数量
name2npreds = {}
#映射名称到后继者列表(子女)
name2succs = {name:[] for name in name2node}

for dag.nodes中的节点:$ b​​ $ b thisname = node.name
name2npreds [thisname] = len (node.parents)
for node.parents:
name2succs [p.name] .append(thisname)

result = [n for n,npreds in name2npred s.items()if npreds == 0]
for p:
for c in name2succs [p]:
npreds = name2npreds [c]
assert npreds
npreds - = 1
name2npreds [c] = npreds
if npreds == 0:
result.append(c)

len(result)< ; len(name2node):
增加ValueError(no topsort - cycle)
返回元组(结果中的name2node [p])

这里有一个微妙的地方:外层循环附加到 result ,而 结果。这是故意的。结果是, result 中的每个元素仅由外层循环处理一次,无论元素是否位于初始结果中 DAG 节点<$ c>时, / code> s遍历,其中没有任何内容被更改。


Kahn proposed an algorithm in 62 to topologically sort any DAG (directed acyclic graph), pseudo code copied from Wikipedia:

L ← Empty list that will contain the sorted elements 
S ← Set of all nodes with no incoming edges 
while S is non-empty do
    remove a node n from S
    add n to tail of L
    for each node m with an edge e from n to m do
        remove edge e from the graph  # This is a DESTRUCTIVE step!
        if m has no other incoming edges then
            insert m into S if graph has edges then
    return error (graph has at least one cycle) else 
    return L (a topologically sorted order)

I need to implement it using IPython3, with the following implementation of a DAG:

class Node(object):
    def __init__(self, name, parents):
        assert isinstance(name, str)
        assert all(isinstance(_, RandomVariable) for _ in parents)
        self.name, self.parents = name, parents

where name is the label for the node and parents stores all of its parent nodes. Then the DAG class is implemented as:

class DAG(object):
    def __init__(self, *nodes):
        assert all(isinstance(_, Node) for _ in nodes)
        self.nodes = nodes

(The DAG implementation is fixed and not to be improved.) Then I need to implement Kahn's algorithm as a function top_order which takes in a DAG instance and returns an ordering like (node_1, node_2, ..., node_n). The main trouble is, this algorithm is destructive because one of its steps is remove edge e from the graph (line 5) which will delete one member of m.parents. However, I have to leave the DAG instance intact.

One way I can think of so far is to create a deep copy of the DAG instance taken in (even a shallow copy can't do the job because the algorithm still destroys the original instance via references), and perform the destructive algorithm on this copy, and then get the correct ordering of node names of this copy (assume there is no naming conflict between nodes), and then use this ordering of names to infer the correct ordering of the nodes of the original instance, which roughly goes like:

def top_order(network):
    '''takes in a DAG, prints and returns a topological ordering.'''
    assert type(network) == DAG
    temp = copy.deepcopy(network) # to leave the original instance intact

    ordering_name = []
    roots = [node for node in temp.nodes if not node.parents]
    while roots:
        n_node = roots[0]
        del roots[0]
        ordering_name.append(n_node.name)
        for m_node in temp.nodes:
            if n_node in m_node.parents:
                temp_list = list(m_node.parents)
                temp_list.remove(n_node)
                m_node.parents = tuple(temp_list)
                if not m_node.parents:
                    roots.append(m_node)

    print(ordering_name) # print ordering by name

    # gets ordering of nodes of the original instance
    ordering = []
    for name in ordering_name:
        for node in network.nodes:
            if node.name == name:
                ordering.append(node)

    return tuple(ordering)

Two problems: first, when network is huge the deep copy will be resource consuming; second, I want an improvement to my nested for loops which gets the ordering of the original instance. (For the second I think something like the sorted method etc pops into my mind.)

Any suggestion?

解决方案

I'm going to suggest a less literal implementation of the algorithm: you don't need to manipulate the DAG at all, you just need to manipulate info about the DAG. The only "interesting" things the algorithm needs are a mapping from a node to its children (the opposite of what your DAG actually stores), and a count of the number of each node's parents.

These are easy to compute, and dicts can be used to associate this info with node names (assuming all names are distinct - if not, you can invent unique names with a bit more code).

Then this should work:

def topsort(dag):
    name2node = {node.name: node for node in dag.nodes}
    # map name to number of predecessors (parents)
    name2npreds = {}
    # map name to list of successors (children)
    name2succs = {name: [] for name in name2node}

    for node in dag.nodes:
        thisname = node.name
        name2npreds[thisname] = len(node.parents)
        for p in node.parents:
            name2succs[p.name].append(thisname)

    result = [n for n, npreds in name2npreds.items() if npreds == 0]
    for p in result:
        for c in name2succs[p]:
            npreds = name2npreds[c]
            assert npreds
            npreds -= 1
            name2npreds[c] = npreds
            if npreds == 0:
                result.append(c)

    if len(result) < len(name2node):
        raise ValueError("no topsort - cycle")
    return tuple(name2node[p] for p in result)

There's one subtle point here: the outer loop appends to result while it's iterating over result. That's intentional. The effect is that every element in result is processed exactly once by the outer loop, regardless of whether an element was in the initial result or added later.

Note that while the input DAG and Nodes are traversed, nothing in them is altered.

这篇关于使用Python实现Kahn的拓扑排序算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆