解析和使用GADT [英] Parsing and the use of GADTs

查看:218
本文介绍了解析和使用GADT的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编写解析器时遇到了问题。具体来说,我想成为不同类型的返回值。例如,我有两种不同的数据类型 FA PA 来表示两种不同的脂质类 -

  data FA = ClassLevelFA IntegerMass 
| FA CarbonChain
派生(Show,Eq,Ord)

数据PA = ClassLevelPA IntegerMass
| CombinedRadylsPA TwoCombinedRadyls
| UnknownSnPA Radyl Radyl
| KnownSnPA Radyl Radyl
derived(Show,Eq,Ord)

使用attoparsec,解析器来解析脂质速记符号。对于上面的数据类型,我有解析器 faParser paParser 。我希望能够解析 FA PA 。但是,由于 FA PA 是不同的数据类型,所以我不能这样做 -

  inputParser = faParser 
< |> paParser

我最近了解到GADT,我认为这可以解决我的问题。因此,我创建了一个GADT和一个 eval 函数,并更改了解析器 faParser paParser 。 -

  data ParsedLipid a where 
ParsedFA :: FA - > ParsedLipid FA
ParsedPA :: PA - > ParsedLipid PA

eval :: ParsedLipid a - > a
eval(ParsedFA val)= val
eval(ParsedPA val)= val

这让我关闭,但它看起来好像 ParsedFA ParsedPA 是不同的数据类型?例如,解析PA 17:1_18:1会给我一个类型 ParsedLipid PA 的值(不只是 ParsedLipid 就像我期待的那样)。因此,解析器 inputParser 仍然无法检查。

  let lipid =使用PA 17:1_18:1
* Main> :t lipid
lipid :: ParsedLipid PA

有关如何解决此问题的任何建议?

解决方案

@MathematicalOrchid指出你可能不需要GADT,而简单的总和类型可能就足够了。您可能会遇到 XY问题,但我对您的用例不足以作出明确的判断。这个答案是关于如何将非类型化数据转化为GADT。



正如你所看到的,你的解析函数不能返回 ParsedLipid a 因为这会让调用上下文自由选择 a ,这是没有意义的; a 实际上是由输入数据决定的。并且您不能返回 ParsedLipid FA ParsedLipid PA ,因为输入数据可能是任何一种类型。因此,从运行时数据构建GADT时的标准技巧 - 当您不知道索引的类型时 - 就是使用 https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/data-type-extensions.html#existential-quantificationrel =nofollow noreferrer> existential quantification / a>。

  data AParsedLipid = forall a。 AParsedLipid(ParsedLipid a)

类型参数 a 出现在 AParsedLipid 的右侧,但不在左侧。值 AParsedLipid 保证包含格式良好的 ParsedLipid ,但其确切类型保密。存在型是一种包装,它可以帮助我们从混乱的,无类型的现实世界转化为干净的,强类型的GADT。



保持存在性是个好主意量化的包装被推送到你的系统的边缘,在那里你必须与外界沟通。您可以使用签名来编写函数,如 ParsedLipid a - >一个在你的核心模型中,并将它们应用于边缘存在封装下的数据。您验证输入,将其包装为存在类型,然后使用强类型模型安全地处理它 - 因为您已经检查了输入,所以不必担心错误。



您可以解开一个 AParsedLipid 来获取您的 ParsedLipid ,并将模式匹配到在运行时确定 a 是 - 它可以是 FA PA

  showFA :: FA  - >字符串
showFA = ...
showPA :: PA - >字符串
showPA = ...

showLipid :: AParsedLipid - >字符串
showLipid(AParsedLipid(ParsedFA x))=AParsedLipid(ParsedFA++ showFA x ++)
showLipid(AParsedLipid(ParsedPA x))=AParsedLipid(ParsedPA++ showPA x ++)

您会注意到 a 不能出现在函数的返回类型中,该函数采用 AParsedLipid 。编译器必须知道返回类型;这种技术不允许你定义一个具有不同返回类型的函数。



当你构造一个 AParsedLipid ,你必须产生足够的证据来说服编译器,包装的 ParsedLipid 是格式良好的。在你的例子中,这归结为解析一个结构良好的 PA FA 然后将其包装起来。

 解析器::解析器AParsedLipid 
解析器= AParsedLipid< $> (fmap ParsedFA faParser< |> fmap ParsedPA paParser)

与GADT一起使用时有点棘手运行时数据。存在的包装器有效地擦除了 ParsedLipid - 中的额外编译时信息,AParsedLipid 同构于 FA PA 。 (证明代码中的同构是一个很好的练习。)根据我的经验,GADT在构造程序时比数据更好 - 他们擅长实现强类型的嵌入式域在编译时可以知道类型的索引的特定语言。例如, Yampa extensible-effects 都使用GADT作为它们的中心数据类型。这有助于编译器检查您在编写的代码中是否正确使用特定领域的语言(并且在某些情况下允许进行某些优化)。您很可能不会在实时数据的运行时构建FRP网络或单点效果。


I've ran into a problem while writing a parser. Specifically, I want to be return values of different types. For example, I have two different data types FA and PA to represent two different lipid classes -

data FA = ClassLevelFA IntegerMass
        | FA           CarbonChain
        deriving (Show, Eq, Ord)

data PA   = ClassLevelPA       IntegerMass
          | CombinedRadylsPA   TwoCombinedRadyls
          | UnknownSnPA        Radyl Radyl
          | KnownSnPA          Radyl Radyl
          deriving (Show, Eq, Ord)

Using attoparsec, I have built parsers to parse lipid shorthand notation. For the data types above, I have the parsers faParser and paParser. I would like to be able to parse for either an FA or PA. However, since FA and PA are different data types, I cannot do the following -

inputParser =  faParser
           <|> paParser

I have recently learnt about GADTs and I thought this would solve my problem. Consequently I went and made a GADT and an eval function and changed the parsers faParser and paParser. -

data ParsedLipid a where
  ParsedFA :: FA -> ParsedLipid FA
  ParsedPA :: PA -> ParsedLipid PA

eval :: ParsedLipid a -> a
eval (ParsedFA val) = val
eval (ParsedPA val) = val

This gets me close but it appears as if ParsedFA and ParsedPA are different data types? For example, parsing "PA 17:1_18:1" gives me a value of type ParsedLipid PA (not just ParsedLipid as I was expecting). Therefore, the parser inputParser still does not type check.

let lipid = use "PA 17:1_18:1"
*Main> :t lipid
lipid :: ParsedLipid PA

Any advice on how to get around this problem?

解决方案

@MathematicalOrchid points out that you probably don't need GADTs and a plain sum type might be enough. You may have an XY problem but I don't know enough about your use case to make a firm judgment. This answer is about how to turn untyped data into a GADT.

As you note, your parsing function can't return a ParsedLipid a because that leaves the calling context free to choose a which doesn't make sense; a is in fact determined by the input data. And you can't return a ParsedLipid FA or a ParsedLipid PA, because the input data may be of either type.

Therefore, the standard trick when building a GADT from runtime data - when you don't know the type of the index in advance - is to use existential quantification.

data AParsedLipid = forall a. AParsedLipid (ParsedLipid a)

The type parameter a appears on the right-hand side of AParsedLipid but not on the left. A value of AParsedLipid is guaranteed to contain a well-formed ParsedLipid, but its precise type is kept secret. An existential type is a wrapper which helps us to translate from the messy, untyped real world to a clean, strongly-typed GADT.

It's a good idea to keep the existentially quantified wrappers pushed to the edges of your system, where you have to communicate with the outside world. You can write functions with signatures like ParsedLipid a -> a in your core model and apply them to data under an existential wrapper at the edges. You validate your input, wrap it in an existential type, and then process it safely using your strongly-typed model - which doesn't have to worry about errors, because you already checked your input.

You can unpack an AParsedLipid to get your ParsedLipid back, and pattern-match on that to determine at runtime what a was - it'll either be FA or PA.

showFA :: FA -> String
showFA = ...
showPA :: PA -> String
showPA = ...

showLipid :: AParsedLipid -> String
showLipid (AParsedLipid (ParsedFA x)) = "AParsedLipid (ParsedFA "++ showFA x ++")"
showLipid (AParsedLipid (ParsedPA x)) = "AParsedLipid (ParsedPA "++ showPA x ++")"

You'll note that a can't appear in the return type of a function taking an AParsedLipid either, for the reasons outlined above. The return type must be known to the compiler; this technique does not allow you to define a "function with different return types".

When you're constructing an AParsedLipid, you have to generate enough evidence to convince the compiler that the wrapped ParsedLipid is well-formed. In your example, this comes down to parsing a well-typed PA or FA and then wrapping it up.

parser :: Parser AParsedLipid
parser = AParsedLipid <$> (fmap ParsedFA faParser <|> fmap ParsedPA paParser)

GADTs are a bit awkward when used with runtime data. The existential wrapper effectively erases the extra compile-time information in a ParsedLipid - AParsedLipid is isomorphic to Either FA PA. (Proving that isomorphism in code is a good exercise.) In my experience GADTs are much better at structuring programs than at structuring data - they excel at implementing strongly-typed embedded domain-specific languages for which the types' indices can be known at compile time. For example, Yampa and extensible-effects both use a GADT as their central data type. This helps the compiler to check that you're using the domain-specific language correctly in the code you write (and in some cases allows certain optimisations). It's pretty unlikely that you'll be building FRP networks or monadic effects at runtime from real-world data.

这篇关于解析和使用GADT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆