为什么差异列表比Haskell中的常规串联更有效? [英] Why are difference lists more efficient than regular concatenation in Haskell?

查看:68
本文介绍了为什么差异列表比Haskell中的常规串联更有效?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在通过向您了解Haskell 在线预订,并进入一章,作者在其中解释一些列表串联可能效率低下:例如

I am currently working my way through the Learn you a Haskell book online, and have come to a chapter where the author is explaining that some list concatenations can be inefficient: For example

((((a ++ b) ++ c) ++ d) ++ e) ++ f

据说效率低下.作者提出的解决方案是使用定义为

is supposedly inefficient. The solution the author comes up with is to use 'difference lists' defined as

newtype DiffList a = DiffList {getDiffList :: [a] -> [a] }

instance Monoid (DiffList a) where
    mempty = DiffList (\xs -> [] ++ xs)
    (DiffList f) `mappend` (DiffList g) = DiffList (\xs -> f (g xs))

在某些情况下,我很难理解DiffList为什么比简单串联更有效的计算.有人可以简单地向我解释为什么上面的示例如此低效,而DiffList用什么方式解决了这个问题?

I am struggling to understand why DiffList is more computationally efficient than a simple concatenation in some cases. Could someone explain to me in simple terms why the above example is so inefficient, and in what way the DiffList solves this problem?

推荐答案

问题所在

((((a ++ b) ++ c) ++ d) ++ e) ++ f

是嵌套. (++)的应用程序是左嵌套的,这很糟糕;右嵌套

is the nesting. The applications of (++) are left-nested, and that's bad; right-nesting

a ++ (b ++ (c ++ (d ++ (e ++f))))

不会有问题.这是因为(++)被定义为

would not be a problem. That is because (++) is defined as

[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)

因此要找到要使用的方程式,实现必须深入到表达式树中

so to find which equation to use, the implementation must dive into the expression tree

             (++)
             /  \
          (++)   f
          /  \
       (++)   e
       /  \
    (++)   d
    /  \
 (++)   c
 /  \
a    b

直到找到左操作数是否为空.如果不为空,则取其头部并将其冒泡到顶部,但保持左操作数的尾部不变,因此当需要连接的下一个元素时,将再次开始相同的过程.

until it finds out whether the left operand is empty or not. If it's not empty, its head is taken and bubbled to the top, but the tail of the left operand is left untouched, so when the next element of the concatenation is demanded, the same procedure starts again.

将连接右对齐时,(++)的左操作数始终在顶部,并且检查是否为空/冒泡为O(1).

When the concatenations are right-nested, the left operand of (++) is always at the top, and checking for emptiness/bubbling up the head are O(1).

但是,如果将连接嵌套在左侧,则深度为n层,才能到达第一个元素,对于结果的每个元素,都必须遍历树的n节点(来自第一个列表对于那些来自第二个等的人.

But when the concatenations are left-nested, n layers deep, to reach the first element, n nodes of the tree must be traversed, for each element of the result (coming from the first list, n-1 for those coming from the second etc.).

让我们考虑a = "hello" in

hi = ((((a ++ b) ++ c) ++ d) ++ e) ++ f

,我们要评估take 5 hi.因此,首先,必须检查是否

and we want to evaluate take 5 hi. So first, it must be checked whether

(((a ++ b) ++ c) ++ d) ++ e

为空.为此,必须检查是否

is empty. For that, it must be checked whether

((a ++ b) ++ c) ++ d

为空.为此,必须检查是否

is empty. For that, it must be checked whether

(a ++ b) ++ c

为空.为此,必须检查是否

is empty. For that, it must be checked whether

a ++ b

为空.为此,必须检查是否

is empty. For that, it must be checked whether

a

为空. ew不是,所以我们可以再次冒泡,组装

is empty. Phew. It isn't, so we can bubble up again, assembling

a ++ b                             = 'h':("ello" ++ b)
(a ++ b) ++ c                      = 'h':(("ello" ++ b) ++ c)
((a ++ b) ++ c) ++ d               = 'h':((("ello" ++ b) ++ c) ++ d)
(((a ++ b) ++ c) ++ d) ++ e        = 'h':(((("ello" ++ b) ++ c) ++ d) ++ e)
((((a ++ b) ++ c) ++ d) ++ e) ++ f = 'h':((((("ello" ++ b) ++ c) ++ d) ++ e) ++ f)

对于'e',我们必须重复,对于'l'也是如此...

and for the 'e', we must repeat, and for the 'l's too...

绘制树的一部分,冒泡是这样的:

Drawing a part of the tree, the bubbling up goes like this:

            (++)
            /  \
         (++)   c
         /  \
'h':"ello"   b

首先成为

     (++)
     /  \
   (:)   c
  /   \
'h'   (++)
      /  \
 "ello"   b

然后

      (:)
      / \
    'h' (++)
        /  \
     (++)   c
     /  \
"ello"   b

一路回到顶部.最终成为顶级(:)的右子级的树的结构与原始树的结构完全相同,除非在

时最左边的列表为空.

all the way back to the top. The structure of the tree that becomes the right child of the top-level (:) finally, is exactly the same as the structure of the original tree, unless the leftmost list is empty, when the

 (++)
 /  \
[]   b

节点折叠为b.

因此,如果您有短列表的左嵌套级联,则级联将变成二次方,因为要获得级联的头是O(嵌套深度)操作.通常,左嵌套的串联

So if you have left-nested concatenations of short lists, the concatenation becomes quadratic because to get the head of the concatenation is an O(nesting-depth) operation. In general, the concatenation of a left-nested

(...((a_d ++ a_{d-1}) ++ a_{d-2}) ...) ++ a_2) ++ a_1

O(sum [i * length a_i | i <- [1 .. d]])以进行全面评估.

有了差异列表(为便于说明,没有使用新型包装器),组成是否为左套无关紧要

With difference lists (sans the newtype wrapper for simplicity of exposition), it's not important whether the compositions are left-nested

((((a ++) . (b ++)) . (c ++)) . (d ++)) . (e ++)

或右嵌套.一旦遍历嵌套到达(a ++),该(++)就会被提升到表达式树的顶部,因此到达a的每个元素再次为O(1).

or right-nested. Once you have traversed the nesting to reach the (a ++), that (++) is hoisted to the top of the expression tree, so getting at each element of a is again O(1).

事实上,只要您需要第一个元素,整个组合就会与差异列表重新关联

In fact, the whole composition is reassociated with difference lists, as soon as you require the first element,

((((a ++) . (b ++)) . (c ++)) . (d ++)) . (e ++) $ f

成为

((((a ++) . (b ++)) . (c ++)) . (d ++)) $ (e ++) f
(((a ++) . (b ++)) . (c ++)) $ (d ++) ((e ++) f)
((a ++) . (b ++)) $ (c ++) ((d ++) ((e ++) f))
(a ++) $ (b ++) ((c ++) ((d ++) ((e ++) f)))
a ++ (b ++ (c ++ (d ++ (e ++ f))))

之后,每个列表都是上一个列表被消耗之后顶级(++)的紧靠左操作数.

and after that, each list is the immediate left operand of the top-level (++) after the preceding list has been consumed.

重要的是,前置函数(a ++)可以在不检查其自变量的情况下开始产生其结果,从而使与

的重新关联

The important thing in that is that the prepending function (a ++) can start producing its result without inspecting its argument, so that the reassociation from

             ($)
             / \
           (.)  f
           / \
         (.) (e ++)
         / \
       (.) (d ++)
       / \
     (.) (c ++)
     / \
(a ++) (b ++)

通过

           ($)---------
           /           \
         (.)           ($)
         / \           / \
       (.) (d ++) (e ++)  f
       / \
     (.) (c ++)
     / \
(a ++) (b ++)

     ($)
     / \
(a ++) ($)
       / \
  (b ++) ($)
         / \
    (c ++) ($)
           / \
      (d ++) ($)
             / \
        (e ++)  f

不需要了解有关最终列表f的组成函数的任何信息,因此它只是一个O(depth)重写.然后是顶级

doesn't need to know anything about the composed functions of the final list f, so it's just an O(depth) rewriting. Then the top-level

     ($)
     / \
(a ++)  stuff

成为

 (++)
 /  \
a    stuff

a的所有元素都可以一步获得.在这个例子中,我们只有纯左嵌套,只需要重写一次即可.如果代替(例如)(d ++)的功能是在左侧嵌套的组合(((g ++) . (h ++)) . (i ++)) . (j ++),则顶层重新关联将保持不变,并且当它成为顶部的左操作数时,它将重新关联. -c ($)级别,直到所有先前的列表都用完.

and all elements of a can be obtained in one step. In this example, where we had pure left-nesting, only one rewriting is necessary. If instead of (for example) (d ++) the function in that place had been a left-nested composition, (((g ++) . (h ++)) . (i ++)) . (j ++), the top-level reassociation would leave that untouched and this would be reassociated when it becomes the left operand of the top-level ($) after all previous lists have been consumed.

所有重新关联所需的总工作量为O(number of lists),因此串联的总成本为O(number of lists + sum (map length lists)). (这意味着您也可以通过插入很多位于左侧的([] ++)来带来不良的性能.)

The total work needed for all reassociations is O(number of lists), so the overall cost for the concatenation is O(number of lists + sum (map length lists)). (That means you can bring bad performance to this too, by inserting a lot of deeply left-nested ([] ++).)

newtype DiffList a = DiffList {getDiffList :: [a] -> [a] }

instance Monoid (DiffList a) where
    mempty = DiffList (\xs -> [] ++ xs)
    (DiffList f) `mappend` (DiffList g) = DiffList (\xs -> f (g xs))

只需将其包装起来,以便更方便地进行抽象处理.

just wraps that so that it is more convenient to handle abstractly.

DiffList (a ++) `mappend` DiffList (b ++) ~> DiffList ((a ++) . (b++))

请注意,它仅对不需要检查其参数的函数才有效以开始产生输出,如果将任意函数包装在DiffList中,则无法保证效率.特别是,追加((++ a),是否包裹)可以在右嵌套时创建(++)的左嵌套树,因此,如果暴露了DiffList构造函数,则可以创建O(n²)串联行为.

Note that it is only efficient for functions that don't need to inspect their argument to start producing output, if arbitrary functions are wrapped in DiffLists, you have no such efficiency guarantees. In particular, appending ((++ a), wrapped or not) can create left-nested trees of (++) when composed right-nested, so you can create the O(n²) concatenation behaviour with that if the DiffList constructor is exposed.

这篇关于为什么差异列表比Haskell中的常规串联更有效?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆