差异列表的显式纯功能数据结构 [英] Explicit Purely-Functional Data-Structure For Difference Lists

查看:114
本文介绍了差异列表的显式纯功能数据结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Haskell中, 差异列表

[a]具有有效串联操作的列表表示形式

似乎是在功能上实现的组成.

但是,还必须使用数据结构以某种方式在计算机内存中表示功能和(动态)功能组合,这引发了以下问题:解决方案

卡尔在评论中打了它.我们可以写

data TList a = Nil | Single a | Node !(TList a) (TList a)

singleton :: a -> TList a
singleton = Single

instance Monoid (TList a) where
  mempty = Nil
  mappend = Node

我们可以通过推导Foldable来获得toList,但是让我们将其写出来以查看正在发生的事情.

instance Foldable TList where
  foldMap _ Nil = mempty
  foldMap f (Single a) = f a
  foldMap f (Node t u) = foldMap f t <> foldMap f u

  toList as0 = go as0 [] where
    go Nil k = k
    go (Single a) k = a : k
    go (Node l r) k = go l (go r k)

toList为O(n),其中n是内部节点的总数(即用于形成TListmappend操作的总数).这应该很清楚:每个Node都要检查一次. memptymappendsingleton显然都是O(1).

这与DList完全相同:

newtype DList a = DList ([a] -> [a])
singletonD :: a -> DList a
singletonD a = DList (a:)
instance Monoid (DList a) where
  mempty = DList id
  mappend (DList f) (DList g) = DList (f . g)
instance Foldable DList where
  foldr c n xs = foldr c n (toList xs)
  toList (DList f) = f []

为什么在操作上是一样的?因为,正如您在问题中指出的那样,函数在内存中表示为树.它们被表示为看起来很像TList的树! singletonD x产生一个包含(无聊的)(:)和一个令人兴奋的x的闭包.应用时,它会执行O(1)的工作. mempty仅产生id函数,该函数在应用时将执行O(1). mappend as bs产生一个闭包,当应用闭包时,O(1)会自己工作,加上O(length as + length bs)在其子级中起作用.

TListDList生成的树的形状实际上是相同的.您应该能够使自己确信,当增量使用它们时,它们也具有相同的渐近性能:在每种情况下,该程序都必须沿着树的左脊柱走到第一个列表元素.


DListTList都可以在建立时使用,然后仅使用一次.一次构建并多次转换为列表时,它们同样糟糕.

就像Will Ness用类似的类型显示的那样,如果您想添加对 deconstructing 表示形式的支持,则显式树表示形式会更好,因为您实际上可以使用该结构. TList可以支持合理有效的uncons操作(在工作时改进结构).为了同样有效地使用unsnoc,您将需要使用更高级的表示形式(可连接的双端队列).此实现还可能使高速缓存的性能降低.您可以切换到不使用缓存的数据结构,但实际上可以保证它很复杂.

In Haskell, difference lists, in the sense of

[a] representation of a list with an efficient concatenation operation

seem to be implemented in terms of function composition.

Functions and (dynamic) function compositions, though, must also be represented somehow in the computer's memory using data structures, which raises the question of how dlists could be implemented in Haskell without using function compositions, but, rather, through some basic purely-functional node-based data structures. How could this be done with the same performance guarantees as those of function composition?

解决方案

Carl hit it in his comment. We can write

data TList a = Nil | Single a | Node !(TList a) (TList a)

singleton :: a -> TList a
singleton = Single

instance Monoid (TList a) where
  mempty = Nil
  mappend = Node

We could get toList by just deriving Foldable, but let's write it out instead to see just what's going on.

instance Foldable TList where
  foldMap _ Nil = mempty
  foldMap f (Single a) = f a
  foldMap f (Node t u) = foldMap f t <> foldMap f u

  toList as0 = go as0 [] where
    go Nil k = k
    go (Single a) k = a : k
    go (Node l r) k = go l (go r k)

toList is O(n), where n is the total number of internal nodes (i.e., the total number of mappend operations used to form the TList). This should be pretty clear: each Node is inspected exactly once. mempty, mappend, and singleton are each obviously O(1).

This is exactly the same as for a DList:

newtype DList a = DList ([a] -> [a])
singletonD :: a -> DList a
singletonD a = DList (a:)
instance Monoid (DList a) where
  mempty = DList id
  mappend (DList f) (DList g) = DList (f . g)
instance Foldable DList where
  foldr c n xs = foldr c n (toList xs)
  toList (DList f) = f []

Why, operationally, is this the same? Because, as you indicate in your question, the functions are represented in memory as trees. And they're represented as trees that look a lot like TLists! singletonD x produces a closure containing a (boring) (:) and an exciting x. When applied, it does O(1) work. mempty just produces the id function, which when applied does O(1) work. mappend as bs produces a closure that, when applied, does O(1) work of its own, plus O(length as + length bs) work in its children.

The shapes of the trees produced for TList and DList are actually the same. You should be able to convince yourself that they also have identical asymptotic performance when used incrementally: in each case, the program has to walk down the left spine of the tree to get to the first list element.


Both DList and TList are equally okay when built up and then used only once. They're equally lousy when built once and converted to lists multiple times.

As Will Ness showed with a similar type, the explicit tree representation is better if you want to add support for deconstructing the representation, as you can actually get your hands on the structure. TList can support a reasonably efficient uncons operation (that improves the structure as it works). To get efficient unsnoc as well, you'll need to use a fancier representation (a catenable deque). This implementation also has potentially wretched cache performance. You can switch to a cache-oblivious data structure, but it is practically guaranteed to be complicated.

这篇关于差异列表的显式纯功能数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆