将类型与数据构造函数关联的ADT编码有什么问题? (如Scala。) [英] What are the problems with an ADT encoding that associates types with data constructors? (Such as Scala.)

查看:165
本文介绍了将类型与数据构造函数关联的ADT编码有什么问题? (如Scala。)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Scala中,代数数据类型被编码为 sealed 单层类型层次结构。例子:
$ b

- Haskell
data定位a =追加
| AppendIf(a - > Bool)
| Explicit([a] - > [a])



  // Scala 
密码特征定位[A]
case对象Append extends Positioning [Nothing]
case class AppendIf [A](condition:A => Boolean )extends Positioning [A]
case class Explicit [A](f:Seq [A] => Seq [A])extends Positioning [A]

使用 case class es和 case object s, Scala会生成一堆如 equals hashCode unapply (用于模式匹配)等等,它为我们带来了许多传统ADT的关键属性和特性。



虽然有一个关键区别 - 在Scala中, 数据构造函数有它们自己的类型。比较以下两个例子(从各个REPL复制)。


$ b

  // Scala 

scala> :t追加
Append.type

scala> :t AppendIf [Int](函数常量true)
AppendIf [Int]

- Haskell

haskell> :t追加
追加::定位

haskell> :t AppendIf(const True)
AppendIf(const True)::定位






我一直认为Scala的变化是有利的一面。

毕竟,没有损失类型信息 AppendIf [Int] 例如是 Positioning [Int] 的子类型。

scala> val subtypeProof =隐式[AppendIf [Int]<:<定位[Int]]
subtypeProof:<:< [AppendIf [Int],Positioning [Int]] =< function1>

事实上,对于值,您会得到额外的编译时间不变。 (我们可以称这是依赖打字的一个有限版本吗?)

这可以很好地使用 - 一旦你知道用什么数据构造函数来创建一个值,相应的类型可以通过其余的流传播以增加更多类型的安全性。例如,使用Scala编码的Play JSON将只允许您从 JsObject 中提取字段,而不是任何任意 JsValue


$ b

  scala> import play.api.libs.json._ 
import play.api.libs.json._

scala> val obj = Json.obj(key - > 3)
obj:play.api.libs.json.JsObject = {key:3}

scala> obj.fields
res0:Seq [(String,play.api.libs.json.JsValue)] = ArrayBuffer((key,3))

scala> val arr = Json.arr(3,4)
arr:play.api.libs.json.JsArray = [3,4]

scala> arr.fields
< console>:15:error:value字段不是play.api.libs.json.JsArray的成员
arr.fields
^

scala> val jsons = Set(obj,arr)
jsons:scala.collection.immutable.Set [Product with Serializable with play.api.libs.json.JsValue] = Set({key:3},[3 ,4])

在Haskell中,字段可能会输入 JsValue - > Set(String,JsValue)。这意味着它会在运行时失败 JsArray 等。这个问题也以众所周知的部分记录访问器的形式出现。

Scala对数据构造函数的处理方式错误的观点已经被无数次地表达过了 - 在Twitter,邮件列表,IRC,SO等等上。不幸的是,我没有任何链接,除了一对 - 这个答案由Travis Brown和 Argonaut 开发,这是一个纯粹的Scala功能JSON库。



Argonaut 自觉地采用Haskell方法(由 private 个案类,并手动提供数据构造函数)。你可以看到我用Haskell编码提到的问题也存在于Argonaut中。 (除了使用 Option 来表示偏好。)

阶>进口argonaut._,Argonaut._
进口argonaut._
进口Argonaut._

斯卡拉> val obj = Json.obj(k:= 3)
obj:argonaut.Json = {k:3}

scala> obj.obj.map(_。toList)
res6:Option [List [(argonaut.Json.JsonField,argonaut.Json)]] = Some(List((k,3)))

scala> val arr = Json.array(jNumber(3),jNumber(4))
arr:argonaut.Json = [3,4]

scala> arr.obj.map(_。toList)
res7:Option [List [(argonaut.Json.JsonField,argonaut.Json)]] = None

我一直在思考这个问题很长一段时间,但仍然不明白是什么让Scala的编码错误。当然,它有时妨碍了类型推断,但这似乎不足以证明它错误。我错过了什么?

解决方案

据我所知,Scala对案例类的惯用编码有两个原因不好:类型推断和类型特异性。前者是句法方便的问题,后者则是推理范围扩大的问题。

子类型问题相对容易说明:

  val x =一些(42)

类型 x 原来是某些[Int] ,这可能不是你想要的。您可以在其他问题较多的领域生成类似的问题:

密封特质ADT
案例类Case1(x:Int)扩展ADT
案例类Case2(x:String)扩展ADT
$ b $ val xs = List(Case1(42),Case1(12))

xs 的类型是列表[案例1] 。这基本上是保证不是你想要的。为了解决这个问题,像 List 这样的容器需要在它们的类型参数中协变。不幸的是,协变引入了一大堆问题,实际上降低了某些构造的稳健性(例如,通过允许协变容器,尽管Scalaz在它的 Monad 类型和几个monad变换器上妥协了,尽管这是不正确的。)

因此,以这种方式对ADT进行编码对您的代码有一定的病毒影响。您不仅需要处理ADT本身的子类型,而且您写过的每个容器都需要考虑到您在不合时宜的地方登陆ADT的子类型的事实。 p>

不使用公共案例类对ADT进行编码的第二个原因是为了避免使用非类型混淆您的类型空间。从某种角度来看,ADT案件并不是真正的类型:它们是数据。如果你以这种方式推理ADT(这没有错!),那么为每个ADT案例分配一流的类型会增加你想要推理代码的一些东西。



例如,考虑上面的 ADT 代数。如果你想推理使用这个ADT的代码,你需要不断思考好吧,如果这个类型是 Case1 ?因为 Case1 是数据,所以这不是一个真正需要 的问题。这是一个特定的coproduct案例的标签。就是这样。



就我个人而言,我不太在意上述任何一点。我的意思是,协方差的不确定性问题是真实的,但我通常只是倾向于使我的容器不变,并指示我的用户吸取它并注释你的类型。这很不方便,也很笨拙,但我发现它更适合替代方案,这是很多样板折叠和小写数据构造函数。



作为通配符,这种类型特殊性的第三个潜在的缺点是鼓励(或者说,允许)更多的面向对象风格,在这些风格中,您将特定于案例的功能放在单独的ADT类型上。我认为,以这种方式混合你的隐喻(case class vs subtype polymorphism)是一个不好的方法。然而,这种结果是否是类型案件的错误是一个悬而未决的问题。

In Scala, algebraic data types are encoded as sealed one-level type hierarchies. Example:

-- Haskell
data Positioning a = Append
                   | AppendIf (a -> Bool)
                   | Explicit ([a] -> [a]) 

// Scala
sealed trait Positioning[A]
case object Append extends Positioning[Nothing]
case class AppendIf[A](condition: A => Boolean) extends Positioning[A]
case class Explicit[A](f: Seq[A] => Seq[A]) extends Positioning[A]

With case classes and case objects, Scala generates a bunch of things like equals, hashCode, unapply (used by pattern matching) etc that brings us many of the key properties and features of traditional ADTs.

There is one key difference though – In Scala, "data constructors" have their own types. Compare the following two for example (Copied from the respective REPLs).

// Scala

scala> :t Append
Append.type

scala> :t AppendIf[Int](Function const true)
AppendIf[Int]

-- Haskell

haskell> :t Append
Append :: Positioning a

haskell> :t AppendIf (const True)
AppendIf (const True) :: Positioning a


I have always considered the Scala variation to be on the advantageous side.

After all, there is no loss of type information. AppendIf[Int] for instance is a subtype of Positioning[Int].

scala> val subtypeProof = implicitly[AppendIf[Int] <:< Positioning[Int]]
subtypeProof: <:<[AppendIf[Int],Positioning[Int]] = <function1>

In fact, you get an additional compile time invariant about the value. (Could we call this a limited version of dependent typing?)

This can be put to good use – Once you know what data constructor was used to create a value, the corresponding type can be propagated through rest of the flow to add more type safety. For example, Play JSON, which uses this Scala encoding, will only allow you to extract fields from JsObject, not from any arbitrary JsValue.

scala> import play.api.libs.json._
import play.api.libs.json._

scala> val obj = Json.obj("key" -> 3)
obj: play.api.libs.json.JsObject = {"key":3}

scala> obj.fields
res0: Seq[(String, play.api.libs.json.JsValue)] = ArrayBuffer((key,3))

scala> val arr = Json.arr(3, 4)
arr: play.api.libs.json.JsArray = [3,4]

scala> arr.fields
<console>:15: error: value fields is not a member of play.api.libs.json.JsArray
              arr.fields
                  ^

scala> val jsons = Set(obj, arr)
jsons: scala.collection.immutable.Set[Product with Serializable with play.api.libs.json.JsValue] = Set({"key":3}, [3,4])

In Haskell, fields would probably have type JsValue -> Set (String, JsValue). Which means it will fail at runtime for a JsArray etc. This problem also manifests in the form of well known partial record accessors.

The view that Scala's treatment of data constructors is wrong has been expressed numerous times – on Twitter, mailing lists, IRC, SO etc. Unfortunately I don't have links to any of those, except for a couple - this answer by Travis Brown, and Argonaut, a purely functional JSON library for Scala.

Argonaut consciously takes the Haskell approach (by privateing case classes, and providing data constructors manually). You can see that the problem I mentioned with Haskell encoding exists with Argonaut as well. (Except it uses Option to indicate partiality.)

scala> import argonaut._, Argonaut._
import argonaut._
import Argonaut._

scala> val obj = Json.obj("k" := 3)
obj: argonaut.Json = {"k":3}

scala> obj.obj.map(_.toList)
res6: Option[List[(argonaut.Json.JsonField, argonaut.Json)]] = Some(List((k,3)))

scala> val arr = Json.array(jNumber(3), jNumber(4))
arr: argonaut.Json = [3,4]

scala> arr.obj.map(_.toList)
res7: Option[List[(argonaut.Json.JsonField, argonaut.Json)]] = None

I have been pondering this for quite some time, but still do not understand what makes Scala's encoding wrong. Sure it hampers type inference at times, but that does not seem like a strong enough reason to decree it wrong. What am I missing?

解决方案

To the best of my knowledge, there are two reasons why Scala's idiomatic encoding of case classes can be bad: type inference, and type specificity. The former is a matter of syntactic convenience, while the latter is a matter of increased scope of reasoning.

The subtyping issue is relatively easy to illustrate:

val x = Some(42)

The type of x turns out to be Some[Int], which is probably not what you wanted. You can generate similar issues in other, more problematic areas:

sealed trait ADT
case class Case1(x: Int) extends ADT
case class Case2(x: String) extends ADT

val xs = List(Case1(42), Case1(12))

The type of xs is List[Case1]. This is basically guaranteed to be not what you want. In order to get around this issue, containers like List need to be covariant in their type parameter. Unfortunately, covariance introduces a whole bucket of issues, and in fact degrades the soundness of certain constructs (e.g. Scalaz compromises on its Monad type and several monad transformers by allowing covariant containers, despite the fact that it is unsound to do so).

So, encoding ADTs in this fashion has a somewhat viral effect on your code. Not only do you need to deal with subtyping in the ADT itself, but every container you ever write needs to take into account the fact that you're landing on subtypes of your ADT at inopportune moments.

The second reason not to encode your ADTs using public case classes is to avoid cluttering up your type space with "non-types". From a certain perspective, ADT cases are not really types: they are data. If you reason about ADTs in this fashion (which is not wrong!), then having first-class types for each of your ADT cases increases the set of things you need to carry in your mind to reason about your code.

For example, consider the ADT algebra from above. If you want to reason about code which uses this ADT, you need to be constantly thinking about "well, what if this type is Case1?" That just not a question anyone really needs to ask, since Case1 is data. It's a tag for a particular coproduct case. That's all.

Personally, I don't care much about any of the above. I mean, the unsoundness issues with covariance are real, but I generally just prefer to make my containers invariant and instruct my users to "suck it up and annotate your types". It's inconvenient and it's dumb, but I find it preferable to the alternative, which is a lot of boilerplate folds and "lower-case" data constructors.

As a wildcard, a third potential disadvantage to this sort of type specificity is it encourages (or rather, allows) a more "object-oriented" style where you put case-specific functions on the individual ADT types. I think there is very little question that mixing your metaphors (case classes vs subtype polymorphism) in this way is a recipe for bad. However, whether or not this outcome is the fault of typed cases is sort of an open question.

这篇关于将类型与数据构造函数关联的ADT编码有什么问题? (如Scala。)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆