DSL的GADT:摆动和回旋处? [英] GADTs for a DSL: swings and roundabouts?

查看:68
本文介绍了DSL的GADT:摆动和回旋处?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有关GADT好处的典型示例表示DSL的语法;在Wiki上说此处Show实例或类似的东西:

{-# LANGUAGE  GADTs, StandaloneDeriving  #-}

data Term a where
  Lit :: Int -> Term Int
  Inc :: Term Int -> Term Int
  IsZ :: Term Int -> Term Bool
  If :: Term Bool -> Term a -> Term a -> Term a
  Pair :: (Show a, Show b) => Term a -> Term b -> Term (a,b)
  Fst :: (Show b) => Term (a,b) -> Term a
  Snd :: (Show a) => Term (a,b) -> Term b

deriving instance (Show a) => Show (Term a)

(在我看来,那些在构造函数中缠结的Show约束已经无法分离问题了.)

我更多地考虑的是有人输入DSL表达式的用户体验,而不是程序员对eval函数的方便.要么

  • 用户直接使用GADT构造函数输入表达式.容易在语法上犯错但类型错误(例如,错误放置的括号).然后,GHCi给出了非常不友好的拒绝消息.或者
  • REPL将输入作为文本进行解析.但是对于这样的GADT,获取Read实例是一项艰巨的工作.所以也许
  • 该应用程序具有两种数据结构:一种是允许类型错误的数据结构;另一种是允许类型错误的数据结构.另一个是GADT;如果可以安全地进行类型验证,则validate步骤将构造GADT AST.

在最后一个项目符号上,我似乎还支持智能构造函数",那就是GADT应该在on(?)上有所改进.此外,我在某处将工作加倍了.

我没有更好的方式"来处理它.我想知道如何在实践中处理DSL应用程序. (对于上下文:我正在考虑一个数据库查询环境,其中类型推断必须查看数据库中字段的类型以验证对其执行的操作.)

添加:完成@Alec的回答

我看到glambda中用于漂亮打印的代码涉及几层类和实例.与GADT所声称的AST优势相比,这里有些不对劲. (类型正确的)AST的想法是,您同样可以轻松地进行评估:或漂亮地打印它;或对其进行优化;或从中生成代码;等

glambda似乎是针对评估的(考虑到练习的目的,这很公平).我想知道...

  • 为什么需要在一种数据类型中表达(E)DSL的整个语法? (Wikibook的示例开始用稻草人做data Expr = ...;然后迅速遇到类型麻烦.当然,它确实做得到;这永远行不通;几乎任何事情都比这更好;我感到上当受骗.)

  • 如果我们最终还是写了类和实例,为什么不让每个语法产生一个单独的数据类型:data Lit = Lit Int ... data If b a1 a2 = If b a1 a2 ...然后是class IsTerm a c | a -> c where ...(即FunDep或也许是类型族,其实例告诉我们术语的结果类型.)

  • 现在,EDSL使用相同的构造函数(用户不在乎它们来自不同的数据类型);并且他们应用了草率"的类型检查.漂亮的打印/错误报告也不需要严格的类型检查. Eval会这样做,并坚持将所有IsTerm实例排列在一起.

我以前没有建议过这种方法,因为它似乎涉及太多繁琐的代码.但是实际上,这并不比glambda差,也就是说,当您考虑整个功能时,不仅要考虑评估步骤.

在我看来,仅表达一次语法是一个很大的优势.此外,它似乎更具可扩展性:为每个语法生成添加新的数据类型,而不是打开现有的数据类型.哦,因为它们是H98数据类型(没有存在),所以deriving可以正常工作.

解决方案

请注意,GHCi不使用GADT表示表达式.甚至GHC的内部核心表达式类型 Expr 不是GADT.

DSLs

为了使您的Term类型的例子更加充实,请考虑 glambda .其 Exp type甚至在类型级别上跟踪变量.

  • 还有第二个UExp数据类型,正如您所观察到的那样,它是从REPL中实际解析的数据类型.然后,将这种类型进行类型检入到Exp中,并传递给以下内容的延续:

    check :: (MonadError Doc m, MonadReader Globals m)
          => UExp -> (forall t. STy t -> Exp '[] t -> m r)
          -> m r
    

  • UExpExp的漂亮印刷品是手写的,但至少评估代码本身很漂亮,但我怀疑我是否需要以此为卖点. :)

EDSL

据我了解,GADT对于EDSL(嵌入式DSL)非常出色,因为它们只是大型Haskell程序中代码的一部分.是的,类型错误可能很复杂(并且将直接来自GHC),但这就是您能够在代码中维护类型级不变式所要付出的代价.例如,考虑hoopl在CFG中基本块的表示形式:

data Block n e x where
  BlockCO  :: n C O -> Block n O O          -> Block n C O
  BlockCC  :: n C O -> Block n O O -> n O C -> Block n C C
  BlockOC  ::          Block n O O -> n O C -> Block n O C

  BNil    :: Block n O O
  BMiddle :: n O O                      -> Block n O O
  BCat    :: Block n O O -> Block n O O -> Block n O O
  BSnoc   :: Block n O O -> n O O       -> Block n O O
  BCons   :: n O O       -> Block n O O -> Block n O O

当然,您可以打开讨厌的类型错误,但是您还可以在类型级别跟踪失败提示信息.这使得考虑数据流问题变得更加容易.

那......

我要提出的观点是:如果您的GADT是通过String(或自定义的REPL)构建的,那么执行翻译的时间将会很艰难.这是不可避免的,因为您所做的本质上是重新实现了一个简单的类型检查器.最好的选择是像glambda一样直面这一点,并从类型检查中区分出解析. >

但是,如果您有能力负担在Haskell代码的范围内,则可以只对GHC进行手工解析和类型检查.恕我直言,EDSL比非嵌入式DSL更凉爽,更实用.

The typical examples for the benefits of a GADT are representing the syntax for a DSL; say here on the wiki or the PLDI 2005 paper.

I can see that if you have a AST that's type-correct by construction, writing an eval function is easy.

How to build the GADT handling into a REPL? Or more specifically into a Read-Parse-Typecheck-Eval-Print-Loop? I'm seeing that you just push the complexity from the eval step into earlier steps.

Does GHCi use a GADT internally to represent expressions it's evaluating? (The expressions are a lot chunkier than a typical DSL.)

  • For one thing, you can't derive Show for a GADT, so for the Print step you either hand-roll Show instances or something like this:

    {-# LANGUAGE  GADTs, StandaloneDeriving  #-}
    
    data Term a where
      Lit :: Int -> Term Int
      Inc :: Term Int -> Term Int
      IsZ :: Term Int -> Term Bool
      If :: Term Bool -> Term a -> Term a -> Term a
      Pair :: (Show a, Show b) => Term a -> Term b -> Term (a,b)
      Fst :: (Show b) => Term (a,b) -> Term a
      Snd :: (Show a) => Term (a,b) -> Term b
    
    deriving instance (Show a) => Show (Term a)
    

    (It seems to me those Show constraints tangled in the constructors are already failing to separate concerns.)

I'm more thinking about the user-experience for somebody entering DSL expressions, rather than the programmer's convenience of the eval function. Either:

  • The user enters expressions directly using the GADT constructors. It's easy to make a syntactically correct but ill-typed mistake (say a mis-placed parens). Then GHCi gives rather unfriendly rejection messages. Or
  • The REPL takes the input as text and parses it. But for a GADT like that, getting a Read instance is real hard work. So perhaps
  • The application has two data structures: one being type-mistake-tolerant; the other being the GADT; and the validate step constructs the GADT AST, if it can do so type-safely.

At that last bullet, I seem to be back with 'smart constructors', that GADTs are supposed to improve on(?) What's more I've doubled the work somewhere.

I don't have a 'better way' to approach it. I'm wondering how to approach DSL applications in practice. (For context: I'm thinking about a database query environment, where type inference has to look at the types of the fields in the database to validate what operations on them.)

Addit: after working through the answer from @Alec

I see the code for pretty printing in glambda involves several layers of classes and instances. Something feels wrong here as opposed to what are the claimed advantages of a GADT for an AST. The idea of a (well-typed) AST is you can equally easily: eval it; or pretty-print it; or optimise it; or code-generate from it; etc.

glambda's seems to be aimed at eval'ing (which is fair enough given the purpose of the exercise). I'm wondering ...

  • Why the need to express the whole syntax for the (E)DSL in one datatype? (The wikibook example starts its straw man doing that data Expr = ...; and rapidly runs into type trouble. Well of course it does; that's never going to work; almost anything would work better than that; I feel cheated.)

  • If we end up writing classes and instances anyway, why not make each syntax production a separate datatype: data Lit = Lit Int ... data If b a1 a2 = If b a1 a2 ... Then a class IsTerm a c | a -> c where ... (i.e. a FunDep or maybe a Type Family whose instances tell us the Term's result-type.)

  • Now the EDSL uses the same constructors (the user doesn't care they're from different datatypes); and they apply 'sloppy' type-checking. Pretty printing/error reporting also doesn't need tight typechecking. Eval does, and insists on the IsTerm instances all lining up.

I didn't suggest this approach before, because it seemed to involve too much crufty code. But actually it's no worse than glambda -- that is, when you consider the whole functionality, not just the eval step.

It seems to me a big advantage to express the syntax only once. Furthermore it seems more extensible: add a new datatype per syntax production, rather than breaking open an existing datatype. Oh, and because they're H98 datatypes (no existentials), deriving works fine.

解决方案

Note that GHCi does not use GADTs to represent expressions. Even GHC's internal core expression type Expr is not a GADT.

DSLs

For the purpose of having a larger more fleshed out example of your Term type, consider glambda. Its Exp type even tracks variables at the type level.

  • There is a second UExp data type which, as you observed yourself, is what gets actually parsed from the REPL. This type then gets typechecked into Exp and passed on to a continuation with:

    check :: (MonadError Doc m, MonadReader Globals m)
          => UExp -> (forall t. STy t -> Exp '[] t -> m r)
          -> m r
    

  • Pretty-printing of UExp and Exp is hand-written, but at least uses the same code (it does this via a PrettyExp class).

  • The evaluation code itself is beautiful, but I doubt I need to sell you on that. :)

EDSLs

As I understand it, GADTs are splendid for EDSLs (embedded DSLs), since these are just portions of code in a large Haskell program. Yes, type errors can be complicated (and will come from GHC directly), but that's the price you pay for being able to maintain type-level invariants in your code. Consider, for instance, hoopl's representation of basic blocks in a CFG:

data Block n e x where
  BlockCO  :: n C O -> Block n O O          -> Block n C O
  BlockCC  :: n C O -> Block n O O -> n O C -> Block n C C
  BlockOC  ::          Block n O O -> n O C -> Block n O C

  BNil    :: Block n O O
  BMiddle :: n O O                      -> Block n O O
  BCat    :: Block n O O -> Block n O O -> Block n O O
  BSnoc   :: Block n O O -> n O O       -> Block n O O
  BCons   :: n O O       -> Block n O O -> Block n O O

Sure, you open yourself up to nasty type errors, but you also have the ability to track fallthrough information at the type-level. This makes it much easier to think about dataflow problems.

So what...?

The point I'm trying to make is: if your GADT is being constructed from a String (or a custom REPL), you'll have a rough time performing the translation. That's unavoidable because what you are doing is essentially re-implementing a simple type-checker. Your best bet is to confront this head on (as glambda does) and distinguish the parsing from the type-checking.

However, if you can afford to stay within the bounds of Haskell code, you can just hand parsing and typechecking to GHC. IMHO, EDSLs are way cooler and more practical that non-embedded DSLs.

这篇关于DSL的GADT:摆动和回旋处?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆