什么构成了除列表之外的其他类型的折叠? [英] What constitutes a fold for types other than list?

查看:133
本文介绍了什么构成了除列表之外的其他类型的折叠?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑一个单链表。它看起来像

 数据列表x =节点x(列表x)|结束

定义一个折叠函数是很自然的,比如

  reduce ::(x  - > y  - > y) - > y  - >列表x  - > y 

从某种意义上讲, reduce f x0 f 和每个 End 替换每个 Node C $ C> X0 。这就是Prelude引用的 fold



现在考虑一个简单的二叉树:

 数据树x =叶x |分支(Tree x)(Tree x)

定义一个函数p>

  reduce ::(y  - > y  - > y) - > (x  - > y) - >树x  - > y 

请注意, 缩减具有完全不同的特征;而基于列表的模式本质上是连续的,这种基于树的模式更具有分而治之的感觉。你甚至可以想象在那里抛出几个 par c> combinators。 (你在哪里把这样的东西放在列表版本中?)



我的问题:这个函数仍然被归类为折叠还是其他东西? (如果是这样,它是什么?)

基本上每当有人谈论折叠时,他们总是谈论折叠列表,这本质上是顺序的。我想知道顺序是否是折叠定义的一部分,或者这仅仅是最常用折叠示例的巧合属性。

Tikhon的技术水平下降了。我想我会尽量简化他所说的话。

不幸的是,多年来,折叠一词含糊不清,意味着两件事之一:


  1. 以某种顺序依次减少集合。在Haskell中,这就是折叠在 Foldable 类中的含义,这是larsmans提出的。

  2. 您要求的概念: 破坏(与构造相反),观察或消除根据其结构的代数数据类型。也称为 catamorphism

可以一般定义这两个概念,以便一个参数化函数能够做到多种类型。 Tikhon展示了如何在第二种情况下完成。



但大多数情况下,整个过程都是用 Fix 和代数这是矫枉过正。让我们为任何代数数据类型编写一个更简单的方法。我们将使用 Maybe ,对,列表和树作为我们的例子:

  data也许a = Nothing |只需
数据Pair a b = Pair a b
数据List a = Nil |缺点(列表a)
数据树x =叶x |分支(树x)(树x)
数据BTree a =空|节点a(BTree a)(BTree a)

请注意 Pair 不是递归的;我要展示的过程并不认为fold类型是递归的。人们通常不会将这种情况称为折叠,但它确实是同一概念的非递归情况。第一步:给定类型的折叠会消耗折叠类型,并产生一些参数类型作为结果。我喜欢称后者 r (用于结果)。所以:

  foldMaybe :: ...  - >也许是 - > r 
foldPair :: ... - >配对b - > r
foldList :: ... - >列表a - > r
foldTree :: ... - >树a - > r
foldBTree :: ... - > BTree a - > r

第二步:除了最后一个参数(结构体)之外,尽可能多的参数类型具有构造函数。 Pair 有一个构造函数,我们的其他例子有两个,所以:

  foldMaybe :: nothing  - >只是 - >也许是 - > r 
foldPair :: pair - >配对b - > r
foldList :: nil - >利弊 - >列表a - > r
foldTree :: leaf - >分支 - >树a - > r
foldBTree :: empty - >节点 - > BTree a - > r

第三步:每个参数都与它所对应的构造函数具有相同的arity。让我们把构造函数当作函数,并写出它们的类型(确保类型变量与我们写的签名中的类型变量匹配):

  Nothing:也许a 
Just :: a - >也许是

Pair :: a - > b - >配对b

无::列表
缺点:: a - >列表a - >列出

Leaf :: a - >树a
分支::树a - >树a - >树a

Empty :: BTree a
Node :: a - > BTree a - > BTree a - > BTree a

第4步:在每个构造函数的签名中,我们将替换所有出现的数据类型它用我们的类型变量 r (我们在折叠签名中使用的)构造:

  nothing:= r 
just:= a - > r

pair:= a - > b - > r

无:= r
缺点:= a - > r - > r

leaf:= a - > r
branch:= r - > r - > r

empty:= r
node:= a - > r - > r - >你可以看到,我已经将结果签名分配给了我的虚拟变量类型第二步。现在第5步:填入先前的草图折叠签名:

  foldMaybe :: r  - > (a→r)→>也许是 - > r 
foldPair ::(a - > b - > r) - >配对b - > r
foldList :: r - > (a→r→r)→>列表a - > r
foldTree ::(a - > r) - > (r→r→r)→>树a - > r
foldBTree :: r - > (a→r→r→r)→> BTree a - > r

现在,这些是这些类型折叠的签名。他们有一个有趣的参数顺序,因为我通过从 data 声明和构造函数类型读取折叠类型来机械地完成此操作,但由于某些原因,在函数式编程中,首先在 data 中定义了递归案例处理程序,并首先在 fold 定义中定义了递归案例处理程序。没问题!让我们重新洗牌,让它们更传统:

  foldMaybe ::(a  - > r) - > r  - >也许是 - > r 
foldPair ::(a - > b - > r) - >配对b - > r
foldList ::(a - > r - > r) - > r - >列表a - > r
foldTree ::(r - > r - > r) - > (a→r)→>树a - > (a→r→r→r)→> r - > BTree a - > r






这些定义也可以用机械方式填写。让我们选择 foldBTree 并逐步实施它。给定类型的折叠是我们想出的类型的一个函数,它符合这个条件:使用类型的构造函数进行折叠是该类型的同一性函数(您得到的结果与您开始使用的值相同)。



我们将以这种方式开始:

  foldBTree ::(a  - > ; r→r→r)→> r  - > BTree a  - > r 
foldBTree = ???

我们知道它有三个参数,所以我们可以添加变量来反映它们。我会用很长的描述性名字:

  foldBTree ::(a  - > r  - > r  - > r) - > r  - > BTree a  - > r 
foldBTree分支空树= ???

查看 data 声明,我们知道 BTree 有两个可能的构造函数。我们可以将这个定义分解为一个个案,并为他们的元素填充变量:

  foldBTree ::(a  - > ; r→r→r)→> r  - > BTree a  - > r 
foldBTree分支空empty = ???
foldBTree分支为空(Branch a l r)= ???
- 让我们用注释来跟踪类型:
- a :: a
- l,r :: BTree a

现在,缺少像 undefined 之类的东西,填补第一个等式的唯一方法是

  foldBTree ::(a  - > r  - > ; r  - > r) - > r  - > BTree a  - > r 
foldBTree分支空empty =空
foldBTree分支空(分支a l r)= ???
- a :: a
- l,r :: BTree a

我们如何填写第二个等式?再次,缺少 undefined ,我们有这个:

 分支:: a  - > r  - > r  - > r 
a :: a
l,r :: BTree a

如果我们有 subfold :: BTree a - > r ,我们可以做 branch a(subfold l)(subfold r):: r 。但是,当然,我们可以很容易地编写'subfold':

pre $ foldBTree ::(a - > r - > r - > r) - > r - > BTree a - > r
foldBTree分支空empty =空
foldBTree分支空(分支alr)=分支a(子折叠l)(子折叠r)
其中subfold = foldBTree分支空
foldBTree

这是 BTree 的倍数。分支清空anyTree == anyTree 。请注意, foldBTree 不是这种类型的唯一功能;还有:

  mangleBTree ::(a  - > r  - > r  - > r) - > r  - > BTree a  - > r 
mangleBTree分支空empty =空
mangleBTree分支空(分支alr)=分支a(submangle r)(submangle l)
其中submangle = mangleBTree分支空

但是通常情况下, mangleBTree 没有必需的属性;例如,如果我们有 foo =分支1(分支2空空)空,那么它就是 mangleBTree分支空foo / = foo 。所以 mangleBTree ,虽然它有正确的类型,但不是折叠。




现在,让我们退一步从细节入手,并着重讨论 mangleTree 示例的最后一点。一个折叠(从结构的意义上说,我的答案顶部#2)只不过是代数类型的最简单,非平凡的函数,当你在类型的构造函数中传递作为其参数时,它将成为该类型的身份识别功能。 (非常平凡,我的意思是不允许像 foo fz xs = xs 这样的东西。)



这非常重要。我喜欢考虑以下两种方式:


  1. 给定类型的折叠可以查看任何包含的信息该类型的值。 (这就是为什么它能够使用类型的构造函数完全重构该类型的任何值)。
    fold是该类型中最通用的消费者函数。任何使用相关类型值的函数都可以这样编写,以便从该类型使用的唯一操作是fold和构造函数。 (尽管某些函数的折叠版本很难编写和执行,尝试用<$ c $编写 tail :: [a] - > [a] c> foldr
    (:) [] / li>

第二点更进一步,因为您甚至不需要构造函数。您可以在不使用 data 声明或构造函数的情况下实现任何代数类型,仅使用折叠:

  { - #LANGUAGE RankNTypes# - } 

- | Church编码列表是一个函数,它使用两个'foldr'参数
- 并从中产生一个结果。
newtype ChurchList a =
ChurchList {runList :: forall r。
(a - > r - > r) - ^'foldr'的第一个参数
- > r - ^'foldr'的第二个参数
- > r - ^'foldr'结果
}

- |便利功能:将ChurchList从常规列表中删除
toChurchList :: [a] - > ChurchList a
toChurchList xs = ChurchList(\ kons knil - > foldr kons knil xs)

- | 'toChurchList'实际上并不需要,但是,我们可以完全不做'['''
]。
cons :: a - > ChurchList a - > ChurchList a
cons x xs = ChurchList(\ fz - > fx(runlist xs fz))

nil :: ChurchList a
nil = ChurchList(\ fz - > z)

foldr'::(a - > r - > r) - > r - > ChurchList a - > r
foldr'f z xs = runList xs f z

head :: ChurchList a - >也许
head = foldr'((Just。)。const)Nothing

append :: ChurchList a - > ChurchList a - > ChurchList a
append xs ys = foldr'cons ys xs

- |将ChurchList转换为常规列表。
fromChurchList :: ChurchList a - > [a]
fromChurchList xs = runList xs(:) []

可以尝试以这种方式编写其他类型(使用 RankNTypes 扩展名阅读本文作为入门课程)。这种技术称为教会编码,有时在实际编程中很有用 - 例如GHC的使用称为 foldr / 构建融合,以优化列表代码以移除中间列表;请参阅此Haskell Wiki页面,并记下 build

  build ::(forall b。(a  - > b  - > b) - > ; b  - > b) - > [a] 
build g = g(:) []

除了 newtype ,这与我上面的 fromChurchList 相同。基本上,GHC用来优化列表处理代码的规则之一是这样的:

   - 不要实现列表if我们要做的就是
- 立即折叠:
foldr kons knil(fromChurchList xs)==> runChurchList xs kons knil

通过实施基本的列表函数来内部使用Church编码,并将此规则应用于内联代码,可以将诸如 map 之类的函数的嵌套使用融合为一个紧密循环。


Consider a single-linked list. It looks something like

data List x = Node x (List x) | End

It is natural to define a folding function such as

reduce :: (x -> y -> y) -> y -> List x -> y

In a sense, reduce f x0 replaces every Node with f and every End with x0. This is what the Prelude refers to as a fold.

Now consider a simple binary tree:

data Tree x = Leaf x | Branch (Tree x) (Tree x)

It is similarly natural to define a function such as

reduce :: (y -> y -> y) -> (x -> y) -> Tree x -> y

Notice that this reduction has a quite different character; whereas the list-based one is inherently sequential, this new tree-based one has more of a divide-and-conquer feel to it. You could even imagine throwing a few par combinators in there. (Where would you put such a thing in the list version?)

My question: Is this function still classified as a "fold", or is it something else? (And if so, what is it?)

Basically whenever anybody talks about folding, they always talk about folding lists, which is inherently sequential. I'm wondering whether "sequential" is part of the definition of what a fold is, or whether this is merely a coincidental property of the most commonly-used example of folding.

解决方案

Tikhon's got the technical stuff down. I think I'll try to simplify down from what he said.

The term "folding" has, unfortunately, become ambiguous over the years to mean one of two things:

  1. Reducing a collection sequentially in some order. In Haskell, this is what "folding" means in the Foldable class, which larsmans brings up.
  2. The notion you asked for: "destructing" (opposite of constructing), "observing" or "eliminating" an algebraic data type according to its structure. Also called a catamorphism.

It is possible to define both of these notions generically so that one parametrized function is capable of doing it for a variety of types. Tikhon shows how to do in the second case.

But most often going the whole way with Fix and the algebras and such is overkill. Let's work out a simpler way of writing the fold for any algebraic data type. We'll use Maybe, pairs, lists and trees as our examples:

data Maybe a = Nothing | Just a
data Pair a b = Pair a b
data List a = Nil | Cons a (List a)
data Tree x = Leaf x | Branch (Tree x) (Tree x)
data BTree a = Empty | Node a (BTree a) (BTree a)

Note that Pair is not recursive; the procedure I'm going to show doesn't assume that the "fold" type is recursive. People don't usually call this case a "fold," but it's really the non-recursive case of the same concept.

First step: the fold for a given type will consume the folded type and produce some parameter type as its result. I like to call the latter r (for "result"). So:

foldMaybe :: ... -> Maybe a -> r
foldPair  :: ... -> Pair a b -> r
foldList  :: ... -> List a -> r
foldTree  :: ... -> Tree a -> r
foldBTree :: ... -> BTree a -> r

Second step: in addition to the last argument (the one for the structure), the fold takes as many arguments as the type has constructors. Pair has one constructor and our other examples have two, so:

foldMaybe :: nothing -> just -> Maybe a -> r
foldPair  :: pair -> Pair a b -> r 
foldList  :: nil -> cons -> List a -> r
foldTree  :: leaf -> branch -> Tree a -> r
foldBTree :: empty -> node -> BTree a -> r

Third step: each of these arguments has the same arity as the constructor it corresponds to. Let's treat the constructors as functions, and write out their types (making sure the type variables match up with the ones in the signatures we're writing):

Nothing :: Maybe a
Just    :: a -> Maybe a

Pair    :: a -> b -> Pair a b

Nil     :: List a
Cons    :: a -> List a -> List a

Leaf    :: a -> Tree a
Branch  :: Tree a -> Tree a -> Tree a

Empty   :: BTree a
Node    :: a -> BTree a -> BTree a -> BTree a

Step 4: in the signature of each constructor, we will replace all occurrences of the data type it constructs with our type variable r (that we're using in our fold signatures):

nothing := r
just    := a -> r

pair    := a -> b -> r

nil     := r
cons    := a -> r -> r

leaf    := a -> r
branch  := r -> r -> r

empty   := r
node    := a -> r -> r -> r

As you can see, I've "assigned" the resulting signatures to my dummy type variables from the second step. Now Step 5: fill those in into the earlier sketch fold signatures:

foldMaybe :: r -> (a -> r) -> Maybe a -> r
foldPair  :: (a -> b -> r) -> Pair a b -> r 
foldList  :: r -> (a -> r -> r) -> List a -> r
foldTree  :: (a -> r) -> (r -> r -> r) -> Tree a -> r
foldBTree :: r -> (a -> r -> r -> r) -> BTree a -> r

Now, these are signatures for the folds of those types. They have a funny argument order, because I did this mechanically by reading the fold type off the data declarations and constructor types, but for some reason in functional programming it's conventional to put base cases first in data definitions yet recursive case handlers first in fold definitions. No problem! Let's reshuffle them to make them more conventional:

foldMaybe :: (a -> r) -> r -> Maybe a -> r
foldPair  :: (a -> b -> r) -> Pair a b -> r 
foldList  :: (a -> r -> r) -> r -> List a -> r
foldTree  :: (r -> r -> r) -> (a -> r) -> Tree a -> r
foldBTree :: (a -> r -> r -> r) -> r -> BTree a -> r


The definitions can also be filled in mechanically. Let's pick foldBTree and implement it step by step. The fold for a given type is the one function of the type we figured out that meets this condition: folding with the type's constructors is an identity function over that type (you get the same result as the value you started with).

We'll start like this:

foldBTree :: (a -> r -> r -> r) -> r -> BTree a -> r
foldBTree = ???

We know it takes three arguments, so we can add variables to reflect them. I'll use long descriptive names:

foldBTree :: (a -> r -> r -> r) -> r -> BTree a -> r
foldBTree branch empty tree = ???

Looking at the data declaration, we know BTree has two possible constructors. We can split the definition into a case for each, and fill out variables for their elements:

foldBTree :: (a -> r -> r -> r) -> r -> BTree a -> r
foldBTree branch empty Empty = ???
foldBTree branch empty (Branch a l r) = ???
    -- Let's use comments to keep track of the types:
    -- a :: a
    -- l, r :: BTree a

Now, short of something like undefined, the only way to fill in the first equation is with empty:

foldBTree :: (a -> r -> r -> r) -> r -> BTree a -> r
foldBTree branch empty Empty = empty
foldBTree branch empty (Branch a l r) = ???
    -- a :: a
    -- l, r :: BTree a

How do we fill in the second equation? Again, short of undefined, we have this:

branch :: a -> r -> r -> r
a      :: a
l, r   :: BTree a

If we had subfold :: BTree a -> r, we could do branch a (subfold l) (subfold r) :: r. But of course, we can write 'subfold' easily:

foldBTree :: (a -> r -> r -> r) -> r -> BTree a -> r
foldBTree branch empty Empty = empty
foldBTree branch empty (Branch a l r) = branch a (subfold l) (subfold r)
    where subfold = foldBTree branch empty

This is the fold for BTree, because foldBTree Branch Empty anyTree == anyTree. Note that foldBTree isn't the only function of this type; there's also this:

mangleBTree :: (a -> r -> r -> r) -> r -> BTree a -> r
mangleBTree branch empty Empty = empty
mangleBTree branch empty (Branch a l r) = branch a (submangle r) (submangle l)
    where submangle = mangleBTree branch empty

But in general, mangleBTree doesn't have the required property; for example if we have foo = Branch 1 (Branch 2 Empty Empty) Empty, it follows that mangleBTree Branch Empty foo /= foo. So mangleBTree, though it has the correct type, is not the fold.


Now, let's take a step back from details, and concentrate on that last point with the mangleTree example. A fold (in the structural sense, #2 at the top of my answer) is nothing more and nothing else than the simplest, non-trivial function for an algebraic type such that, when you give pass in the type's constructors as its arguments, it becomes the identity function for that type. (By nontrivial I mean that things like foo f z xs = xs is not allowed.)

This is very significant. Two ways I like to think about it are as follows:

  1. The fold for a given type can "see" all the information contained by any value of that type. (This is why it's able to perfectly "reconstruct" any value of that type from the ground up using the type's constructors.)
  2. The fold is the most general "consumer" function for that type. Any function that consumes a value of the type in question can be written so that the only operations it uses from that type are the fold and the constructors. (Though the fold-only versions of some functions are hard to write and perform badly; try writing tail :: [a] -> [a] with foldr, (:) and [] as a painful exercise.)

And the second point goes even further, in that you don't even need the constructors. You can implement any algebraic type without using data declarations or constructors, using nothing but folds:

{-# LANGUAGE RankNTypes #-}

-- | A Church-encoded list is a function that takes the two 'foldr' arguments
-- and produces a result from them.
newtype ChurchList a = 
    ChurchList { runList :: forall r. 
                            (a -> r -> r)  -- ^ first arg of 'foldr'
                         -> r              -- ^ second arg of 'foldr'
                         -> r              -- ^ 'foldr' result
               }

-- | Convenience function: make a ChurchList out of a regular list
toChurchList :: [a] -> ChurchList a
toChurchList xs = ChurchList (\kons knil -> foldr kons knil xs)

-- | 'toChurchList' isn't actually needed, however, we can make do without '[]'
-- completely.
cons :: a -> ChurchList a -> ChurchList a
cons x xs = ChurchList (\f z -> f x (runlist xs f z))

nil :: ChurchList a
nil = ChurchList (\f z -> z)

foldr' :: (a -> r -> r) -> r -> ChurchList a -> r
foldr' f z xs = runList xs f z

head :: ChurchList a -> Maybe a
head = foldr' ((Just .) . const) Nothing

append :: ChurchList a -> ChurchList a -> ChurchList a
append xs ys = foldr' cons ys xs

-- | Convert a 'ChurchList' to a regular list.
fromChurchList :: ChurchList a -> [a]
fromChurchList xs = runList xs (:) []

As an exercise you can try writing other types in this way (which uses the RankNTypes extension—read this for a primer). This technique is called Church encoding, and is sometimes useful in actual programming—for example, GHC uses something called foldr/build fusion to optimize list code to remove intermediate lists; see this Haskell Wiki page, and note the type of build:

build :: (forall b. (a -> b -> b) -> b -> b) -> [a]
build g = g (:) []

Except for the newtype, this is the same as my fromChurchList above. Basically, one of the rules that GHC uses to optimize list processing code is this:

-- Don't materialize the list if all we're going to do with it is
-- fold it right away:
foldr kons knil (fromChurchList xs) ==> runChurchList xs kons knil

By implementing the basic list functions to use Church encodings internally, inlining their definitions aggressively, and applying this rule to the inlined code, nested uses of functions like map can be fused into a tight loop.

这篇关于什么构成了除列表之外的其他类型的折叠?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆