滥用代数数据类型的代数 - 为什么这样做? [英] Abusing the algebra of algebraic data types - why does this work?

查看:25
本文介绍了滥用代数数据类型的代数 - 为什么这样做?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

代数数据类型的代数"表达式对于具有数学背景的人来说非常具有启发性.让我试着解释一下我的意思.

The 'algebraic' expression for algebraic data types looks very suggestive to someone with a background in mathematics. Let me try to explain what I mean.

定义基本类型

  • 产品
  • 联合+
  • 单例X
  • 单元 1

并使用速记 表示 X•X2X 表示 X+X 等等,我们然后可以定义代数表达式,例如链表

and using the shorthand for X•X and 2X for X+X et cetera, we can then define algebraic expressions for e.g. linked lists

data List a = Nil |Cons a (List a)L = 1 + X • L

和二叉树:

数据树 a = Nil |分支a(树a)(树a)T = 1 + X • T²

现在,作为数学家,我的第一直觉是对这些表达式发疯,并尝试求解 LT.我可以通过重复替换来做到这一点,但似乎更容易可怕地滥用符号并假装我可以随意重新排列它.例如,对于链表:

Now, my first instinct as a mathematician is to go nuts with these expressions, and try to solve for L and T. I could do this through repeated substitution, but it seems much easier to abuse the notation horrifically and pretend I can rearrange it at will. For example, for a linked list:

L = 1 + X • L

(1 - X) • L = 1

L = 1/(1 - X) = 1 + X + X² + X³ + ...

我以一种完全不合理的方式使用了 1/(1 - X) 的幂级数展开来推导出一个有趣的结果,即 L 类型要么是Nil,要么包含1个元素,要么包含2个元素,或者3个,等等.

where I've used the power series expansion of 1 / (1 - X) in a totally unjustified way to derive an interesting result, namely that an L type is either Nil, or it contains 1 element, or it contains 2 elements, or 3, etc.

如果我们对二叉树这样做会更有趣:

It gets more interesting if we do it for binary trees:

T = 1 + X • T²

X • T² - T + 1 = 0

T = (1 - √(​​1 - 4 • X))/(2 • X)

T = 1 + X + 2 • X² + 5 • X³ + 14 • X⁴ + ...

再次使用幂级数展开式(使用 Wolfram Alpha).这表达了一个不明显的(对我而言)事实,即只有一棵具有 1 个元素的二叉树、2 棵具有两个元素的二叉树(第二个元素可以在左侧或右侧分支上)、5 个具有三个元素的二叉树等.

again, using the power series expansion (done with Wolfram Alpha). This expresses the non-obvious (to me) fact that there is only one binary tree with 1 element, 2 binary trees with two elements (the second element can be on the left or the right branch), 5 binary trees with three elements etc.

所以我的问题是 - 我在这里做什么?这些操作似乎不合理(无论如何,代数数据类型的平方根到底是什么?)但它们会导致合理的结果.两种代数数据类型的商在计算机科学中有什么意义,还是只是符号上的把戏?

So my question is - what am I doing here? These operations seem unjustified (what exactly is the square root of an algebraic data type anyway?) but they lead to sensible results. does the quotient of two algebraic data types have any meaning in computer science, or is it just notational trickery?

而且,也许更有趣的是,是否可以扩展这些想法?是否有类型代数理论允许,例如,类型上的任意函数,或者类型是否需要幂级数表示?如果可以定义一个函数类,那么函数的组合还有意义吗?

And, perhaps more interestingly, is it possible to extend these ideas? Is there a theory of the algebra of types that allows, for example, arbitrary functions on types, or do types require a power series representation? If you can define a class of functions, then does composition of functions have any meaning?

推荐答案

免责声明:当您考虑 ⊥ 时,很多这些都不太正确,所以为了简单起见,我将公然无视.

Disclaimer: A lot of this doesn't really work quite right when you account for ⊥, so I'm going to blatantly disregard that for the sake of simplicity.

一些初始点:

  • 请注意,联合"在这里可能不是 A+B 的最佳术语——特别是 a disjoint union 这两种类型,因为即使它们的类型相同,两侧也是有区别的.就其价值而言,更常见的术语就是总和类型".

  • Note that "union" is probably not the best term for A+B here--that's specifically a disjoint union of the two types, because the two sides are distinguished even if their types are the same. For what it's worth, the more common term is simply "sum type".

Singleton 类型实际上是所有单元类型.它们在代数运算下表现相同,更重要的是,仍然保留了存在的信息量.

Singleton types are, effectively, all unit types. They behave identically under algebraic manipulations and, more importantly, the amount of information present is still preserved.

您可能也需要零类型.Haskell 将其提供为 Void.没有类型为零的值,就像只有一个类型为一的值一样.

You probably want a zero type as well. Haskell provides that as Void. There are no values whose type is zero, just as there is one value whose type is one.

这里还缺少一个主要的操作,但我稍后再讲.

There's still one major operation missing here but I'll get back to that in a moment.

您可能已经注意到,Haskell 倾向于从范畴论中借用概念,并且上述所有内容都有一个非常直接的解释:

As you've probably noticed, Haskell tends to borrow concepts from Category Theory, and all of the above has a very straightforward interpretation as such:

  • 给定 Hask 中的对象 A 和 B,它们的乘积 A×B 是唯一的(最多同构)类型,允许两个投影 fst : A×B → A 和 snd : A×B → B,其中给定任何类型 C 和函数 f : C → A, g : C →B 你可以定义配对 f &&&g : C → A×B 使得 fst ∘ (f &&& g) = f 并且对于 g.参数化自动保证通用属性,我对名称的不太微妙的选择应该给你这个想法.顺便说一下,(&&&) 运算符是在 Control.Arrow 中定义的.

  • Given objects A and B in Hask, their product A×B is the unique (up to isomorphism) type that allows two projections fst : A×B → A and snd : A×B → B, where given any type C and functions f : C → A, g : C → B you can define the pairing f &&& g : C → A×B such that fst ∘ (f &&& g) = f and likewise for g. Parametricity guarantees the universal properties automatically and my less-than-subtle choice of names should give you the idea. The (&&&) operator is defined in Control.Arrow, by the way.

上面的对偶是 A+B 与注射 inl : A → A+B 和 inr : B → A+B,其中给定任何类型 C 和函数 f : A → C, g : B → C,您可以定义配对 f |||g : A+B → C 使得明显的等价性成立.同样,参数化自动保证了大多数棘手的部分.在这种情况下,标准注入是简单的 LeftRight,而copairing 是函数 either.

The dual of the above is the coproduct A+B with injections inl : A → A+B and inr : B → A+B, where given any type C and functions f : A → C, g : B → C, you can define the copairing f ||| g : A+B → C such that the obvious equivalences hold. Again, parametricity guarantees most of the tricky parts automatically. In this case, the standard injections are simply Left and Right and the copairing is the function either.

乘积和求和类型的许多属性都可以从上面推导出来.请注意,任何单例类型都是 Hask 的终端对象,任何空类型都是初始对象.

Many of the properties of product and sum types can be derived from the above. Note that any singleton type is a terminal object of Hask and any empty type is an initial object.

回到前面提到的缺失操作,在笛卡尔封闭类别中,你有指数对象,对应于类别的箭头.我们的箭头是函数,我们的对象是类型为 * 的类型,类型为 A ->B 在类型的代数操作上下文中确实表现为 BA.如果不明显为什么这应该成立,请考虑类型 Bool ->一个.只有两个可能的输入,该类型的函数与 A 类型的两个值同构,即 (A, A).对于 Maybe Bool ->A 我们有三个可能的输入,依此类推.另外,请注意,如果我们将上面的配对定义改写为使用代数符号,我们会得到等式 CA × CB = CA+B.

Returning to the aforementioned missing operation, in a cartesian closed category you have exponential objects that correspond to arrows of the category. Our arrows are functions, our objects are types with kind *, and the type A -> B indeed behaves as BA in the context of algebraic manipulation of types. If it's not obvious why this should hold, consider the type Bool -> A. With only two possible inputs, a function of that type is isomorphic to two values of type A, i.e. (A, A). For Maybe Bool -> A we have three possible inputs, and so on. Also, observe that if we rephrase the copairing definition above to use algebraic notation, we get the identity CA × CB = CA+B.

至于为什么这一切都是有道理的——特别是为什么你使用幂级数展开是合理的——请注意,上面的大部分内容都是指一种类型的居民"(即具有该类型的不同值)以证明代数行为.为了明确这个观点:

As for why this all makes sense--and in particular why your use of the power series expansion is justified--note that much of the above refers to the "inhabitants" of a type (i.e., distinct values having that type) in order to demonstrate the algebraic behavior. To make that perspective explicit:

  • 产品类型(A, B) 代表分别来自AB 的值,它们是独立的.因此对于任何固定值a :: A,对于B 的每个居民都有一个(A, B) 类型的值.这当然是笛卡尔积,乘积类型的居民人数是因子的居民人数的乘积.

  • The product type (A, B) represents a value each from A and B, taken independently. So for any fixed value a :: A, there is one value of type (A, B) for each inhabitant of B. This is of course the cartesian product, and the number of inhabitants of the product type is the product of the number of inhabitants of the factors.

和类型Either A B 表示来自AB 的值,并区分左右分支.如前所述,这是一个不相交的并集,和类型的居民数是被加数的居民数之和.

The sum type Either A B represents a value from either A or B, with the left and right branches distinguished. As mentioned earlier, this is a disjoint union, and the number of inhabitants of the sum type is the sum of the number of inhabitants of the summands.

指数型B ->A 表示从 B 类型的值到 A 类型的值的映射.对于任何固定参数b :: BA的任何值都可以赋值给它;B 类型的值 ->A 为每个输入选择一个这样的映射,这相当于 A 的副本与 B 拥有的居民一样多,因此取幂.p>

The exponential type B -> A represents a mapping from values of type B to values of type A. For any fixed argument b :: B, any value of A can be assigned to it; a value of type B -> A picks one such mapping for each input, which is equivalent to a product of as many copies of A as B has inhabitants, hence the exponentiation.

虽然起初将类型视为集合很诱人,但在这种情况下这实际上并不能很好地工作——我们有不相交的并集而​​不是标准的集合并集,没有对交集或许多其他集合操作的明显解释,并且我们通常不关心集合成员资格(将其留给类型检查器).

While it's tempting at first to treat types as sets, that doesn't actually work very well in this context--we have disjoint union rather than the standard union of sets, there's no obvious interpretation of intersection or many other set operations, and we don't usually care about set membership (leaving that to the type checker).

另一方面,上面的结构花了很多时间讨论计数居民,并且枚举一个类型的可能值在这里是一个有用的概念.这很快将我们引向枚举组合,如果您查阅链接的维基百科文章,您会发现它所做的第一件事就是通过 生成函数,然后使用与您所做的完全相同的技术对与 Haskell 列表相同的序列"执行相同的操作.

On the other hand, the constructions above spend a lot of time talking about counting inhabitants, and enumerating the possible values of a type is a useful concept here. That quickly leads us to enumerative combinatorics, and if you consult the linked Wikipedia article you'll find that one of the first things it does is define "pairs" and "unions" in exactly the same sense as product and sum types by way of generating functions, then does the same for "sequences" that are identical to Haskell's lists using exactly the same technique you did.

哦,这里有一个快速奖励,我认为它非常显着地证明了这一点.您在评论中提到,对于树类型 T = 1 + T^2,您可以导出标识 T^6 = 1,这显然是错误的.但是,T^7 = T 确实成立,并且可以直接构造树和树的七元组之间的双射,参见.Andreas Blass 的七棵树合一".

Oh, and here's a quick bonus that I think demonstrates the point strikingly. You mentioned in a comment that for a tree type T = 1 + T^2 you can derive the identity T^6 = 1, which is clearly wrong. However, T^7 = T does hold, and a bijection between trees and seven-tuples of trees can be constructed directly, cf. Andreas Blass's "Seven Trees in One".

编辑×2:关于其他答案中提到的类型导数"构造的主题,您可能也喜欢来自同一作者的这篇论文 进一步建立在这个想法的基础上,包括除法的概念和其他有趣的东西.

Edit×2: On the subject of the "derivative of a type" construction mentioned in other answers, you might also enjoy this paper from the same author which builds on the idea further, including notions of division and other interesting whatnot.

这篇关于滥用代数数据类型的代数 - 为什么这样做?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆