差异列表的显式纯功能数据结构 [英] Explicit Purely-Functional Data-Structure For Difference Lists
问题描述
在Haskell中, 差异列表
[a]具有有效串联操作的列表表示形式
似乎是在功能上实现的组成.
但是,还必须使用数据结构以某种方式在计算机内存中表示功能和(动态)功能组合,这引发了以下问题:解决方案卡尔在评论中打了它.我们可以写
data TList a = Nil | Single a | Node !(TList a) (TList a)
singleton :: a -> TList a
singleton = Single
instance Monoid (TList a) where
mempty = Nil
mappend = Node
我们可以通过推导Foldable
来获得toList
,但是让我们将其写出来以查看正在发生的事情.
instance Foldable TList where
foldMap _ Nil = mempty
foldMap f (Single a) = f a
foldMap f (Node t u) = foldMap f t <> foldMap f u
toList as0 = go as0 [] where
go Nil k = k
go (Single a) k = a : k
go (Node l r) k = go l (go r k)
toList
为O(n),其中n是内部节点的总数(即用于形成TList
的mappend
操作的总数).这应该很清楚:每个Node
都要检查一次. mempty
,mappend
和singleton
显然都是O(1).
这与DList
完全相同:
newtype DList a = DList ([a] -> [a])
singletonD :: a -> DList a
singletonD a = DList (a:)
instance Monoid (DList a) where
mempty = DList id
mappend (DList f) (DList g) = DList (f . g)
instance Foldable DList where
foldr c n xs = foldr c n (toList xs)
toList (DList f) = f []
为什么在操作上是一样的?因为,正如您在问题中指出的那样,函数在内存中表示为树.它们被表示为看起来很像TList
的树! singletonD x
产生一个包含(无聊的)(:)
和一个令人兴奋的x
的闭包.应用时,它会执行O(1)的工作. mempty
仅产生id
函数,该函数在应用时将执行O(1). mappend as bs
产生一个闭包,当应用闭包时,O(1)会自己工作,加上O(length as + length bs)在其子级中起作用.
为TList
和DList
生成的树的形状实际上是相同的.您应该能够使自己确信,当增量使用它们时,它们也具有相同的渐近性能:在每种情况下,该程序都必须沿着树的左脊柱走到第一个列表元素.
DList
和TList
都可以在建立时使用,然后仅使用一次.一次构建并多次转换为列表时,它们同样糟糕.
就像Will Ness用类似的类型显示的那样,如果您想添加对 deconstructing 表示形式的支持,则显式树表示形式会更好,因为您实际上可以使用该结构. TList
可以支持合理有效的uncons
操作(在工作时改进结构).为了同样有效地使用unsnoc
,您将需要使用更高级的表示形式(可连接的双端队列).此实现还可能使高速缓存的性能降低.您可以切换到不使用缓存的数据结构,但实际上可以保证它很复杂.
In Haskell, difference lists, in the sense of
[a] representation of a list with an efficient concatenation operation
seem to be implemented in terms of function composition.
Functions and (dynamic) function compositions, though, must also be represented somehow in the computer's memory using data structures, which raises the question of how dlists could be implemented in Haskell without using function compositions, but, rather, through some basic purely-functional node-based data structures. How could this be done with the same performance guarantees as those of function composition?
Carl hit it in his comment. We can write
data TList a = Nil | Single a | Node !(TList a) (TList a)
singleton :: a -> TList a
singleton = Single
instance Monoid (TList a) where
mempty = Nil
mappend = Node
We could get toList
by just deriving Foldable
, but let's write it out instead to see just what's going on.
instance Foldable TList where
foldMap _ Nil = mempty
foldMap f (Single a) = f a
foldMap f (Node t u) = foldMap f t <> foldMap f u
toList as0 = go as0 [] where
go Nil k = k
go (Single a) k = a : k
go (Node l r) k = go l (go r k)
toList
is O(n), where n is the total number of internal nodes (i.e., the total number of mappend
operations used to form the TList
). This should be pretty clear: each Node
is inspected exactly once. mempty
, mappend
, and singleton
are each obviously O(1).
This is exactly the same as for a DList
:
newtype DList a = DList ([a] -> [a])
singletonD :: a -> DList a
singletonD a = DList (a:)
instance Monoid (DList a) where
mempty = DList id
mappend (DList f) (DList g) = DList (f . g)
instance Foldable DList where
foldr c n xs = foldr c n (toList xs)
toList (DList f) = f []
Why, operationally, is this the same? Because, as you indicate in your question, the functions are represented in memory as trees. And they're represented as trees that look a lot like TList
s! singletonD x
produces a closure containing a (boring) (:)
and an exciting x
. When applied, it does O(1) work. mempty
just produces the id
function, which when applied does O(1) work. mappend as bs
produces a closure that, when applied, does O(1) work of its own, plus O(length as + length bs) work in its children.
The shapes of the trees produced for TList
and DList
are actually the same. You should be able to convince yourself that they also have identical asymptotic performance when used incrementally: in each case, the program has to walk down the left spine of the tree to get to the first list element.
Both DList
and TList
are equally okay when built up and then used only once. They're equally lousy when built once and converted to lists multiple times.
As Will Ness showed with a similar type, the explicit tree representation is better if you want to add support for deconstructing the representation, as you can actually get your hands on the structure. TList
can support a reasonably efficient uncons
operation (that improves the structure as it works). To get efficient unsnoc
as well, you'll need to use a fancier representation (a catenable deque). This implementation also has potentially wretched cache performance. You can switch to a cache-oblivious data structure, but it is practically guaranteed to be complicated.
这篇关于差异列表的显式纯功能数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!