以 Newick 格式懒惰地打印一棵树 [英] Printing a tree lazily in Newick format

查看:34
本文介绍了以 Newick 格式懒惰地打印一棵树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望以 Newick 格式 打印二叉树,显示每个节点与其父节点的距离.目前我没有遇到以下代码的问题,它使用常规递归,但树太深可能会产生堆栈溢出.

I wish to print a binary tree in Newick format, showing each node's distance to its parent. At the moment I haven't had an issue with the following code, which uses regular recursion, but a tree too deep may produce a stack overflow.

(defn tree->newick
  [tree]
  (let [{:keys [id children to-parent]} tree
        dist (double to-parent)] ; to-parent may be a rational
    (if children
      (str "(" (tree->newick (first children)) 
           "," (tree->newick (second children)) 
           "):" dist)
      (str (name id) ":" dist))))

(def example {:id nil :to-parent 0.0 
              :children [{:id nil :to-parent 0.5 
                          :children [{:id "A" :to-parent 0.3 :children nil}
                                     {:id "B" :to-parent 0.2 :children nil}]}
                         {:id "C" :to-parent 0.8 :children nil}]})

(tree->newick example)
;=> "((A:0.3,B:0.2):0.5,C:0.8):0.0"

(def linear-tree (->> {:id "bottom" :to-parent 0.1 :children nil}
                   (iterate #(hash-map :id nil :to-parent 0.1 
                                       :children [% {:id "side" :to-parent 0.1 :children nil}]))
                   (take 10000)
                   last))

(tree->newick linear-tree)
;=> StackOverflowError

我在当前实用程序中发现的问题,例如 tree-seqclojure.walk,是我必须多次访问内部节点,插入逗号并关闭括号.我已经使用了 clojure.zip,但没有设法编写一个惰性/尾递归实现,因为我需要为每个内部节点存储它们已经被访问了多少次.>

The problem I've found with current utilities, such as tree-seq and clojure.walk, is that I have to visit an inner node more than once, to interpose the comma and close the bracket. I've used clojure.zip, but didn't manage to write a lazy/tail-recursive implementation, as I would need to store for each inner node how many times they were already visited.

推荐答案

这是一个适用于您的 linear-tree 示例的版本.这是您的实现的直接转换,有两个变化:它使用了延续传递风格和蹦床.

Here's a version that works on your linear-tree example. It's a direct conversion of your implementation with two changes: it uses continuation passing style and the trampoline.

(defn tree->newick
  ([tree]
     (trampoline tree->newick tree identity))
  ([tree cont]
     (let [{:keys [id children to-parent]} tree
           dist (double to-parent)]     ; to-parent may be a rational
       (if children
         (fn []
           (tree->newick
            (first children)
            (fn [s1] (fn []
                       (tree->newick
                        (second children)
                        (fn [s2] (cont (str "(" s1 "," s2 "):" dist))))))))
         (cont (str (name id) ":" dist))))))

编辑:添加模式匹配以允许以简单的方式调用函数.

Edit: added pattern matching to allow calling the function in a simple way.

编辑 2:我注意到我犯了错误.问题是我确实考虑到了 Clojure 并未仅部分优化尾调用这一事实.

Edit 2: I noticed that I made mistake. The problem is that I did take the fact that Clojure doesn't optimize tail calls only partially into account.

我的解决方案的核心思想是转换为延续传递风格,这样递归调用就可以移动到尾部位置(即不是返回结果,递归调用将其作为参数传递给延续).

The core idea of my solution is the transformation into continuation passing style so the recursive calls can be moved into tail position (i.e. instead of returning their result, the recursive calls pass it to the continuation as argument).

然后我通过让它们使用蹦床来手动优化递归调用.我忘记考虑的是,延续的调用——不是递归调用,而是在尾部位置——也需要优化,因为尾部调用可以是一个很长的闭包链,这样当函数最终评估它们,它变成了一长串调用.

Then I hand-optimized the recursive calls by making them use the trampoline. What I forgot to consider is that the calls of the continuations - which are not recursive calls but also in tail position - need to be optimized, too, because the tail calls can be a very long chain of closures, so that when the function finally evaluates them, it becomes a long chain of calls.

这个问题在测试数据linear-tree中没有出现,因为第一个孩子的延续返回到蹦床来处理第二个孩子的递归调用.但是如果 linear-tree 被改变,使它使用每个节点的第二个子节点而不是第一个子节点来构建线性树,这会再次导致堆栈溢出.

This problem did not materialize with the the test data linear-tree because the continuation for the first child returns to the trampoline to process recursive call for the second child. But if linear-tree is changed so that it uses the second child of every node to build the linear tree instead of the first child, this does again cause a stack overflow.

所以延续的调用也需要返回到蹦床.(实际上,在 no children 基本情况下的调用不会,因为它在返回到蹦床之前最多会发生一次,并且第二次递归调用也是如此.)所以这里的实现确实考虑了这一点并且应该在所有输入上只使用常量堆栈空间:

So the calls of the continuations need to return to the trampoline, too. (Actually, the call in the no children base case does not, because it will happen at most once before returning to trampoline, and the same will then be true for the second recursive call.) So here's an implementation that does take this into consideration and should use only constant stack space on all inputs:

(defn tree->newick
  ([tree]
     (trampoline tree->newick tree identity))
  ([tree cont]
     (let [{:keys [id children to-parent]} tree
           dist (double to-parent)]     ; to-parent may be a rational
       (if children
         (fn [] (tree->newick
                 (first children)
                 (fn [s1] (tree->newick
                           (second children)
                           (fn [s2] #(cont (str "(" s1 "," s2 "):" dist)))))))
         (cont (str (name id) ":" dist))))))

这篇关于以 Newick 格式懒惰地打印一棵树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆