什么构成了列表以外类型的折叠? [英] What constitutes a fold for types other than list?

查看:26
本文介绍了什么构成了列表以外类型的折叠?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑一个单链表.它看起来像

Consider a single-linked list. It looks something like

data List x = Node x (List x) | End

很自然的定义一个折叠函数比如

It is natural to define a folding function such as

reduce :: (x -> y -> y) -> y -> List x -> y

从某种意义上说,reduce f x0 将每个 Node 替换为 f 并将每个 End 替换为 x0.这就是 Prelude 所说的 折叠.

In a sense, reduce f x0 replaces every Node with f and every End with x0. This is what the Prelude refers to as a fold.

现在考虑一个简单的二叉树:

Now consider a simple binary tree:

data Tree x = Leaf x | Branch (Tree x) (Tree x)

同样很自然地定义一个函数,例如

It is similarly natural to define a function such as

reduce :: (y -> y -> y) -> (x -> y) -> Tree x -> y

请注意,这个减少有一个完全不同的特点;虽然基于列表的方法本质上是顺序的,但这种新的基于树的方法更具有分而治之的感觉.你甚至可以想象在那里扔几个 par 组合器.(你会把这样的东西放在列表版本的什么地方?)

Notice that this reduction has a quite different character; whereas the list-based one is inherently sequential, this new tree-based one has more of a divide-and-conquer feel to it. You could even imagine throwing a few par combinators in there. (Where would you put such a thing in the list version?)

我的问题:这个功能是否仍然归类为折叠",还是其他什么?(如果是,那是什么?)

My question: Is this function still classified as a "fold", or is it something else? (And if so, what is it?)

基本上每当有人谈论折叠时,他们总是谈论折叠列表,这本质上是顺序的.我想知道顺序"是否是折叠定义的一部分,或者这是否只是最常用的折叠示例的巧合属性.

Basically whenever anybody talks about folding, they always talk about folding lists, which is inherently sequential. I'm wondering whether "sequential" is part of the definition of what a fold is, or whether this is merely a coincidental property of the most commonly-used example of folding.

推荐答案

Tikhon 解决了技术问题.我想我会尽量简化他所说的.

Tikhon's got the technical stuff down. I think I'll try to simplify down from what he said.

不幸的是,多年来,折叠"一词已经变得模棱两可,表示以下两种情况之一:

The term "folding" has, unfortunately, become ambiguous over the years to mean one of two things:

  1. 按某种顺序依次减少集合.在 Haskell 中,这就是 larsmans 提出的 Foldable 类中折叠"的含义.
  2. 您要求的概念:根据代数数据类型的结构破坏"(与构造相反)、观察"或消除"代数数据类型.也称为 catamorphism.
  1. Reducing a collection sequentially in some order. In Haskell, this is what "folding" means in the Foldable class, which larsmans brings up.
  2. The notion you asked for: "destructing" (opposite of constructing), "observing" or "eliminating" an algebraic data type according to its structure. Also called a catamorphism.

可以通用地定义这两个概念,以便一个参数化函数能够为各种类型执行此操作.Tikhon 展示了如何处理第二种情况.

It is possible to define both of these notions generically so that one parametrized function is capable of doing it for a variety of types. Tikhon shows how to do in the second case.

但大多数情况下,使用 Fix 和代数等方法来解决问题是多余的.让我们找出一种更简单的方法来为任何代数数据类型编写折叠.我们将使用 Maybe、配对、列表和树作为示例:

But most often going the whole way with Fix and the algebras and such is overkill. Let's work out a simpler way of writing the fold for any algebraic data type. We'll use Maybe, pairs, lists and trees as our examples:

data Maybe a = Nothing | Just a
data Pair a b = Pair a b
data List a = Nil | Cons a (List a)
data Tree x = Leaf x | Branch (Tree x) (Tree x)
data BTree a = Empty | Node a (BTree a) (BTree a)

注意 Pair 不是递归的;我要展示的过程并不假设折叠"类型是递归的.人们通常不称这种情况为折叠",但它确实是同一概念的非递归情况.

Note that Pair is not recursive; the procedure I'm going to show doesn't assume that the "fold" type is recursive. People don't usually call this case a "fold," but it's really the non-recursive case of the same concept.

第一步:给定类型的折叠将消耗折叠类型并产生一些参数类型作为其结果.我喜欢称后者为 r(表示结果").所以:

First step: the fold for a given type will consume the folded type and produce some parameter type as its result. I like to call the latter r (for "result"). So:

foldMaybe :: ... -> Maybe a -> r
foldPair  :: ... -> Pair a b -> r
foldList  :: ... -> List a -> r
foldTree  :: ... -> Tree a -> r
foldBTree :: ... -> BTree a -> r

第二步:除了最后一个参数(结构的那个)之外,折叠接受与类型具有构造函数一样多的参数.Pair 有一个构造函数,我们其他的例子有两个,所以:

Second step: in addition to the last argument (the one for the structure), the fold takes as many arguments as the type has constructors. Pair has one constructor and our other examples have two, so:

foldMaybe :: nothing -> just -> Maybe a -> r
foldPair  :: pair -> Pair a b -> r 
foldList  :: nil -> cons -> List a -> r
foldTree  :: leaf -> branch -> Tree a -> r
foldBTree :: empty -> node -> BTree a -> r

第三步:这些参数中的每一个都与它对应的构造函数具有相同的数量.让我们将构造函数视为函数,并写出它们的类型(确保类型变量与我们正在编写的签名中的类型变量匹配):

Third step: each of these arguments has the same arity as the constructor it corresponds to. Let's treat the constructors as functions, and write out their types (making sure the type variables match up with the ones in the signatures we're writing):

Nothing :: Maybe a
Just    :: a -> Maybe a

Pair    :: a -> b -> Pair a b

Nil     :: List a
Cons    :: a -> List a -> List a

Leaf    :: a -> Tree a
Branch  :: Tree a -> Tree a -> Tree a

Empty   :: BTree a
Node    :: a -> BTree a -> BTree a -> BTree a

第 4 步:在每个构造函数的签名中,我们将用我们的类型变量 r(我们在折叠签名中使用的)替换它构造的所有出现的数据类型:

Step 4: in the signature of each constructor, we will replace all occurrences of the data type it constructs with our type variable r (that we're using in our fold signatures):

nothing := r
just    := a -> r

pair    := a -> b -> r

nil     := r
cons    := a -> r -> r

leaf    := a -> r
branch  := r -> r -> r

empty   := r
node    := a -> r -> r -> r

如您所见,我已将第二步中生成的签名分配"给我的虚拟类型变量.现在第 5 步:将这些填入之前的草图折叠签名:

As you can see, I've "assigned" the resulting signatures to my dummy type variables from the second step. Now Step 5: fill those in into the earlier sketch fold signatures:

foldMaybe :: r -> (a -> r) -> Maybe a -> r
foldPair  :: (a -> b -> r) -> Pair a b -> r 
foldList  :: r -> (a -> r -> r) -> List a -> r
foldTree  :: (a -> r) -> (r -> r -> r) -> Tree a -> r
foldBTree :: r -> (a -> r -> r -> r) -> BTree a -> r

现在,这些是这些类型折叠的签名.它们有一个有趣的参数顺序,因为我通过从 data 声明和构造函数类型中读取折叠类型来机械地做到这一点,但是由于某种原因,在函数式编程中,通常将基本情况放在 中data 定义但递归 case 处理程序首先在 fold 定义中.没问题!让我们重新调整它们以使其更传统:

Now, these are signatures for the folds of those types. They have a funny argument order, because I did this mechanically by reading the fold type off the data declarations and constructor types, but for some reason in functional programming it's conventional to put base cases first in data definitions yet recursive case handlers first in fold definitions. No problem! Let's reshuffle them to make them more conventional:

foldMaybe :: (a -> r) -> r -> Maybe a -> r
foldPair  :: (a -> b -> r) -> Pair a b -> r 
foldList  :: (a -> r -> r) -> r -> List a -> r
foldTree  :: (r -> r -> r) -> (a -> r) -> Tree a -> r
foldBTree :: (a -> r -> r -> r) -> r -> BTree a -> r

<小时>

定义也可以机械填写.让我们选择 foldBTree 并逐步实现它.给定类型的折叠是我们发现的满足此条件的类型的一个函数:使用类型的构造函数折叠是对该类型的恒等函数(您得到的结果与开始时的值相同).


The definitions can also be filled in mechanically. Let's pick foldBTree and implement it step by step. The fold for a given type is the one function of the type we figured out that meets this condition: folding with the type's constructors is an identity function over that type (you get the same result as the value you started with).

我们会这样开始:

foldBTree :: (a -> r -> r -> r) -> r -> BTree a -> r
foldBTree = ???

我们知道它需要三个参数,所以我们可以添加变量来反映它们.我将使用较长的描述性名称:

We know it takes three arguments, so we can add variables to reflect them. I'll use long descriptive names:

foldBTree :: (a -> r -> r -> r) -> r -> BTree a -> r
foldBTree branch empty tree = ???

查看data 声明,我们知道BTree 有两个可能的构造函数.我们可以将定义拆分为一个 case,并为它们的元素填写变量:

Looking at the data declaration, we know BTree has two possible constructors. We can split the definition into a case for each, and fill out variables for their elements:

foldBTree :: (a -> r -> r -> r) -> r -> BTree a -> r
foldBTree branch empty Empty = ???
foldBTree branch empty (Branch a l r) = ???
    -- Let's use comments to keep track of the types:
    -- a :: a
    -- l, r :: BTree a

现在,缺少undefined之类的东西,填充第一个等式的唯一方法是empty:

Now, short of something like undefined, the only way to fill in the first equation is with empty:

foldBTree :: (a -> r -> r -> r) -> r -> BTree a -> r
foldBTree branch empty Empty = empty
foldBTree branch empty (Branch a l r) = ???
    -- a :: a
    -- l, r :: BTree a

我们如何填写第二个等式?同样,缺少 undefined,我们有这个:

How do we fill in the second equation? Again, short of undefined, we have this:

branch :: a -> r -> r -> r
a      :: a
l, r   :: BTree a

如果我们有 subfold :: BTree a ->r,我们可以做branch a (subfold l) (subfold r) :: r.但是当然,我们可以轻松地编写子折叠":

If we had subfold :: BTree a -> r, we could do branch a (subfold l) (subfold r) :: r. But of course, we can write 'subfold' easily:

foldBTree :: (a -> r -> r -> r) -> r -> BTree a -> r
foldBTree branch empty Empty = empty
foldBTree branch empty (Branch a l r) = branch a (subfold l) (subfold r)
    where subfold = foldBTree branch empty

这是BTree的折叠,因为foldBTree Branch Empty anyTree == anyTree.请注意,foldBTree 并不是这种类型的唯一函数;还有这个:

This is the fold for BTree, because foldBTree Branch Empty anyTree == anyTree. Note that foldBTree isn't the only function of this type; there's also this:

mangleBTree :: (a -> r -> r -> r) -> r -> BTree a -> r
mangleBTree branch empty Empty = empty
mangleBTree branch empty (Branch a l r) = branch a (submangle r) (submangle l)
    where submangle = mangleBTree branch empty

但一般情况下,mangleBTree 没有所需的属性;例如,如果我们有 foo = Branch 1 (Branch 2 Empty Empty) Empty,它遵循 mangleBTree Branch Empty foo/= foo.所以 mangleBTree 虽然类型正确,但不是折叠.

But in general, mangleBTree doesn't have the required property; for example if we have foo = Branch 1 (Branch 2 Empty Empty) Empty, it follows that mangleBTree Branch Empty foo /= foo. So mangleBTree, though it has the correct type, is not the fold.

现在,让我们从细节上退后一步,专注于 mangleTree 示例的最后一点.折叠(在结构意义上,我的答案顶部的#2)无非是代数类型的最简单、非平凡的函数,这样,当您将类型的构造函数作为参数传入时,它成为该类型的恒等函数.(非平凡我的意思是像 foo f z xs = xs 这样的东西是不允许的.)

Now, let's take a step back from details, and concentrate on that last point with the mangleTree example. A fold (in the structural sense, #2 at the top of my answer) is nothing more and nothing else than the simplest, non-trivial function for an algebraic type such that, when you give pass in the type's constructors as its arguments, it becomes the identity function for that type. (By nontrivial I mean that things like foo f z xs = xs is not allowed.)

这是非常重要的.我喜欢的思考方式有以下两种:

This is very significant. Two ways I like to think about it are as follows:

  1. 给定类型的折叠可以看到"该类型的任何值包含的所有信息.(这就是为什么它能够使用类型的构造函数从头开始完美地重​​构"该类型的任何值.)
  2. 折叠是该类型最通用的消费者"功能.任何使用所讨论类型的值的函数都可以被编写,以便它从该类型使用的唯一操作是折叠和构造函数.(尽管某些函数的 fold-only 版本难以编写且性能不佳;尝试使用 foldr 编写 tail :: [a] -> [a](:)[] 作为一个痛苦的练习.)
  1. The fold for a given type can "see" all the information contained by any value of that type. (This is why it's able to perfectly "reconstruct" any value of that type from the ground up using the type's constructors.)
  2. The fold is the most general "consumer" function for that type. Any function that consumes a value of the type in question can be written so that the only operations it uses from that type are the fold and the constructors. (Though the fold-only versions of some functions are hard to write and perform badly; try writing tail :: [a] -> [a] with foldr, (:) and [] as a painful exercise.)

第二点更进一步,因为您甚至不需要构造函数.您可以在不使用 data 声明或构造函数的情况下实现任何代数类型,只使用折叠:

And the second point goes even further, in that you don't even need the constructors. You can implement any algebraic type without using data declarations or constructors, using nothing but folds:

{-# LANGUAGE RankNTypes #-}

-- | A Church-encoded list is a function that takes the two 'foldr' arguments
-- and produces a result from them.
newtype ChurchList a = 
    ChurchList { runList :: forall r. 
                            (a -> r -> r)  -- ^ first arg of 'foldr'
                         -> r              -- ^ second arg of 'foldr'
                         -> r              -- ^ 'foldr' result
               }

-- | Convenience function: make a ChurchList out of a regular list
toChurchList :: [a] -> ChurchList a
toChurchList xs = ChurchList (kons knil -> foldr kons knil xs)

-- | 'toChurchList' isn't actually needed, however, we can make do without '[]'
-- completely.
cons :: a -> ChurchList a -> ChurchList a
cons x xs = ChurchList (f z -> f x (runlist xs f z))

nil :: ChurchList a
nil = ChurchList (f z -> z)

foldr' :: (a -> r -> r) -> r -> ChurchList a -> r
foldr' f z xs = runList xs f z

head :: ChurchList a -> Maybe a
head = foldr' ((Just .) . const) Nothing

append :: ChurchList a -> ChurchList a -> ChurchList a
append xs ys = foldr' cons ys xs

-- | Convert a 'ChurchList' to a regular list.
fromChurchList :: ChurchList a -> [a]
fromChurchList xs = runList xs (:) []

作为练习,您可以尝试以这种方式编写其他类型(使用 RankNTypes 扩展——阅读本文作为入门).这种技术称为教会编码,有时在实际编程中很有用——例如,GHC 使用某些东西调用foldr/build融合优化列表代码,去除中间列表;请参阅此 Haskell Wiki 页面,并注意 build 的类型:

As an exercise you can try writing other types in this way (which uses the RankNTypes extension—read this for a primer). This technique is called Church encoding, and is sometimes useful in actual programming—for example, GHC uses something called foldr/build fusion to optimize list code to remove intermediate lists; see this Haskell Wiki page, and note the type of build:

build :: (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []

除了newtype,这与我上面的fromChurchList 相同.基本上,GHC 用于优化列表处理代码的规则之一是:

Except for the newtype, this is the same as my fromChurchList above. Basically, one of the rules that GHC uses to optimize list processing code is this:

-- Don't materialize the list if all we're going to do with it is
-- fold it right away:
foldr kons knil (fromChurchList xs) ==> runChurchList xs kons knil

通过实现基本列表函数以在内部使用 Church 编码,积极地内联它们的定义,并将此规则应用于内联代码,诸如 map 之类的函数的嵌套使用可以融合成一个紧密的循环.

By implementing the basic list functions to use Church encodings internally, inlining their definitions aggressively, and applying this rule to the inlined code, nested uses of functions like map can be fused into a tight loop.

这篇关于什么构成了列表以外类型的折叠?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆