将类型与数据构造函数相关联的 ADT 编码有哪些问题?(例如 Scala.) [英] What are the problems with an ADT encoding that associates types with data constructors? (Such as Scala.)

查看:12
本文介绍了将类型与数据构造函数相关联的 ADT 编码有哪些问题?(例如 Scala.)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Scala 中,代数数据类型被编码为 sealed 一级类型层次结构.示例:

-- Haskell数据定位 a = 追加|AppendIf (a -> Bool)|显式 ([a] -> [a])

//Scala密封特质定位[A]case object Append extends Positioning[Nothing]case class AppendIf[A](condition: A => Boolean) extends Positioning[A]case class Explicit[A](f: Seq[A] => Seq[A]) extends Positioning[A]

使用case classes 和case objects,Scala 生成了一堆东西,比如equalshashCodeunapply(由模式匹配使用)等,为我们带来了传统 ADT 的许多关键属性和特性.

不过,有一个关键区别 - 在 Scala 中,数据构造函数"有自己的类型.例如比较以下两个(从各自的 REPL 中复制).

//Scala标度>:t 追加附加类型标度>:t AppendIf[Int](Function const true)AppendIf[Int]-- 哈斯克尔哈斯克尔>:t 追加附加 :: 定位一个哈斯克尔>:t AppendIf (const True)AppendIf (const True) :: 定位一个

<小时>

我一直认为 Scala 变体是有利的.

毕竟,没有丢失类型信息.AppendIf[Int] 例如是 Positioning[Int] 的子类型.

scala>val subtypeProof = 隐式[AppendIf[Int] <:<定位[Int]]subtypeProof: <:<[AppendIf[Int],Positioning[Int]] = <function1>

事实上,你会得到一个额外的关于该值的编译时间不变量.(我们可以将其称为依赖类型的受限版本吗?)

这可以很好地利用——一旦你知道使用什么数据构造函数来创建一个值,相应的类型就可以通过流程的其余部分传播,以增加更多的类型安全性.例如,使用这种 Scala 编码的 Play JSON 将只允许您从 JsObject 中提取 fields,而不能从任何任意 JsValue 中提取.

scala>导入 play.api.libs.json._导入 play.api.libs.json._标度>val obj = Json.obj("key" -> 3)对象:play.api.libs.json.JsObject = {"key":3}标度>对象字段res0: Seq[(String, play.api.libs.json.JsValue)] = ArrayBuffer((key,3))标度>val arr = Json.arr(3, 4)arr: play.api.libs.json.JsArray = [3,4]标度>arr.fields<console>:15: 错误:值字段不是 play.api.libs.json.JsArray 的成员arr.fields^标度>val jsons = Set(obj, arr)jsons: scala.collection.immutable.Set[带有可序列化的产品 play.api.libs.json.JsValue] = Set({"key":3}, [3,4])

在 Haskell 中,fields 的类型可能是 JsValue ->设置(字符串,JsValue).这意味着它会在运行时因 JsArray 等而失败.这个问题也以众所周知的部分记录访问器的形式表现出来.

Scala 对数据构造函数的处理是错误的观点已经被多次表达过——在 Twitter、邮件列表、IRC、SO 等上.不幸的是,我没有任何这些的链接,除了一对夫妇 - 这个答案由 Travis Brown 和 Argonaut,一个用于 Scala 的纯函数式 JSON 库.

Argonaut 有意识地采用 Haskell 方法(由 privateing case 类,并手动提供数据构造函数).你可以看到我提到的 Haskell 编码问题也存在于 Argonaut 中.(除非它使用 Option 来表示偏袒.)

scala>进口 argonaut._, Argonaut._进口 argonaut._进口Argonaut._标度>val obj = Json.obj("k":= 3)obj: argonaut.Json = {"k":3}标度>obj.obj.map(_.toList)res6: Option[List[(argonaut.Json.JsonField, argonaut.Json)]] = Some(List((k,3)))标度>val arr = Json.array(jNumber(3), jNumber(4))arr: argonaut.Json = [3,4]标度>arr.obj.map(_.toList)res7: 选项[列表[(argonaut.Json.JsonField, argonaut.Json)]] = 无

这个问题我想了很久,还是不明白是什么让Scala的编码出错了.当然它有时会妨碍类型推断,但这似乎不是一个足够强大的理由来判定它是错误的.我错过了什么?

解决方案

据我所知,Scala 对 case 类的惯用编码可能不好的原因有两个:类型推断和类型特异性.前者是句法方便的问题,后者是推理范围扩大的问题.

子类型问题比较容易说明:

val x = Some(42)

x 的类型原来是 Some[Int],这可能不是你想要的.您可以在其他问题较多的领域产生类似的问题:

密封 trait ADT案例类 Case1(x: Int) 扩展 ADT案例类 Case2(x: String) 扩展 ADTval xs = List(Case1(42), Case1(12))

xs 的类型是 List[Case1].这基本上保证不是您想要的.为了解决这个问题,像 List 这样的容器需要在它们的类型参数中是协变的.不幸的是,协变引入了一大堆问题,实际上降低了某些结构的健全性(例如,Scalaz 通过允许协变容器对其 Monad 类型和几个 monad 转换器做出妥协,尽管事实上它是不健全的这样做).

因此,以这种方式编码 ADT 会对您的代码产生某种程度的病毒影响.您不仅需要处理 ADT 本身中的子类型,而且每个您编写的容器都需要考虑这样一个事实,即您在不合时宜的时候登陆 ADT 的子类型.>

不使用公共案例类对 ADT 进行编码的第二个原因是为了避免非类型"使您的类型空间混乱.从某种角度来看,ADT 案例并不是真正的类型:它们是数据.如果您以这种方式推理 ADT(这并没有错!),那么为每个 ADT 案例提供一流的类型会增加您在推理代码时需要牢记的事情.

例如,考虑上面的 ADT 代数.如果你想推理使用这个 ADT 的代码,你需要不断地思考好吧,如果这个类型是 Case1 怎么办?"这不是任何人真正需要问的问题,因为 Case1 是数据.它是特定联产品案例的标签.仅此而已.

就我个人而言,我不太关心上述任何一项.我的意思是,协方差的不健全问题是真实的,但我通常更喜欢让我的容器保持不变并指示我的用户吸收它并注释你的类型".这很不方便而且很笨,但我发现它比替代方案更可取,后者有很多样板折叠和小写"数据构造函数.

作为通配符,这种类型特定性的第三个潜在缺点是它鼓励(或者更确切地说,允许)一种更加面向对象"的风格,您可以在单个 ADT 类型上放置特定于案例的函数.我认为以这种方式混合你的隐喻(案例类与子类型多态性)是一种糟糕的方法,这几乎没有问题.然而,这种结果是否是类型案例的错是一个悬而未决的问题.

In Scala, algebraic data types are encoded as sealed one-level type hierarchies. Example:

-- Haskell
data Positioning a = Append
                   | AppendIf (a -> Bool)
                   | Explicit ([a] -> [a]) 

// Scala
sealed trait Positioning[A]
case object Append extends Positioning[Nothing]
case class AppendIf[A](condition: A => Boolean) extends Positioning[A]
case class Explicit[A](f: Seq[A] => Seq[A]) extends Positioning[A]

With case classes and case objects, Scala generates a bunch of things like equals, hashCode, unapply (used by pattern matching) etc that brings us many of the key properties and features of traditional ADTs.

There is one key difference though – In Scala, "data constructors" have their own types. Compare the following two for example (Copied from the respective REPLs).

// Scala

scala> :t Append
Append.type

scala> :t AppendIf[Int](Function const true)
AppendIf[Int]

-- Haskell

haskell> :t Append
Append :: Positioning a

haskell> :t AppendIf (const True)
AppendIf (const True) :: Positioning a


I have always considered the Scala variation to be on the advantageous side.

After all, there is no loss of type information. AppendIf[Int] for instance is a subtype of Positioning[Int].

scala> val subtypeProof = implicitly[AppendIf[Int] <:< Positioning[Int]]
subtypeProof: <:<[AppendIf[Int],Positioning[Int]] = <function1>

In fact, you get an additional compile time invariant about the value. (Could we call this a limited version of dependent typing?)

This can be put to good use – Once you know what data constructor was used to create a value, the corresponding type can be propagated through rest of the flow to add more type safety. For example, Play JSON, which uses this Scala encoding, will only allow you to extract fields from JsObject, not from any arbitrary JsValue.

scala> import play.api.libs.json._
import play.api.libs.json._

scala> val obj = Json.obj("key" -> 3)
obj: play.api.libs.json.JsObject = {"key":3}

scala> obj.fields
res0: Seq[(String, play.api.libs.json.JsValue)] = ArrayBuffer((key,3))

scala> val arr = Json.arr(3, 4)
arr: play.api.libs.json.JsArray = [3,4]

scala> arr.fields
<console>:15: error: value fields is not a member of play.api.libs.json.JsArray
              arr.fields
                  ^

scala> val jsons = Set(obj, arr)
jsons: scala.collection.immutable.Set[Product with Serializable with play.api.libs.json.JsValue] = Set({"key":3}, [3,4])

In Haskell, fields would probably have type JsValue -> Set (String, JsValue). Which means it will fail at runtime for a JsArray etc. This problem also manifests in the form of well known partial record accessors.

The view that Scala's treatment of data constructors is wrong has been expressed numerous times – on Twitter, mailing lists, IRC, SO etc. Unfortunately I don't have links to any of those, except for a couple - this answer by Travis Brown, and Argonaut, a purely functional JSON library for Scala.

Argonaut consciously takes the Haskell approach (by privateing case classes, and providing data constructors manually). You can see that the problem I mentioned with Haskell encoding exists with Argonaut as well. (Except it uses Option to indicate partiality.)

scala> import argonaut._, Argonaut._
import argonaut._
import Argonaut._

scala> val obj = Json.obj("k" := 3)
obj: argonaut.Json = {"k":3}

scala> obj.obj.map(_.toList)
res6: Option[List[(argonaut.Json.JsonField, argonaut.Json)]] = Some(List((k,3)))

scala> val arr = Json.array(jNumber(3), jNumber(4))
arr: argonaut.Json = [3,4]

scala> arr.obj.map(_.toList)
res7: Option[List[(argonaut.Json.JsonField, argonaut.Json)]] = None

I have been pondering this for quite some time, but still do not understand what makes Scala's encoding wrong. Sure it hampers type inference at times, but that does not seem like a strong enough reason to decree it wrong. What am I missing?

解决方案

To the best of my knowledge, there are two reasons why Scala's idiomatic encoding of case classes can be bad: type inference, and type specificity. The former is a matter of syntactic convenience, while the latter is a matter of increased scope of reasoning.

The subtyping issue is relatively easy to illustrate:

val x = Some(42)

The type of x turns out to be Some[Int], which is probably not what you wanted. You can generate similar issues in other, more problematic areas:

sealed trait ADT
case class Case1(x: Int) extends ADT
case class Case2(x: String) extends ADT

val xs = List(Case1(42), Case1(12))

The type of xs is List[Case1]. This is basically guaranteed to be not what you want. In order to get around this issue, containers like List need to be covariant in their type parameter. Unfortunately, covariance introduces a whole bucket of issues, and in fact degrades the soundness of certain constructs (e.g. Scalaz compromises on its Monad type and several monad transformers by allowing covariant containers, despite the fact that it is unsound to do so).

So, encoding ADTs in this fashion has a somewhat viral effect on your code. Not only do you need to deal with subtyping in the ADT itself, but every container you ever write needs to take into account the fact that you're landing on subtypes of your ADT at inopportune moments.

The second reason not to encode your ADTs using public case classes is to avoid cluttering up your type space with "non-types". From a certain perspective, ADT cases are not really types: they are data. If you reason about ADTs in this fashion (which is not wrong!), then having first-class types for each of your ADT cases increases the set of things you need to carry in your mind to reason about your code.

For example, consider the ADT algebra from above. If you want to reason about code which uses this ADT, you need to be constantly thinking about "well, what if this type is Case1?" That just not a question anyone really needs to ask, since Case1 is data. It's a tag for a particular coproduct case. That's all.

Personally, I don't care much about any of the above. I mean, the unsoundness issues with covariance are real, but I generally just prefer to make my containers invariant and instruct my users to "suck it up and annotate your types". It's inconvenient and it's dumb, but I find it preferable to the alternative, which is a lot of boilerplate folds and "lower-case" data constructors.

As a wildcard, a third potential disadvantage to this sort of type specificity is it encourages (or rather, allows) a more "object-oriented" style where you put case-specific functions on the individual ADT types. I think there is very little question that mixing your metaphors (case classes vs subtype polymorphism) in this way is a recipe for bad. However, whether or not this outcome is the fault of typed cases is sort of an open question.

这篇关于将类型与数据构造函数相关联的 ADT 编码有哪些问题?(例如 Scala.)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆