如何在 Happy 中使用 Alex monadic 词法分析器? [英] How to use an Alex monadic lexer with Happy?

查看:25
本文介绍了如何在 Happy 中使用 Alex monadic 词法分析器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试学习使用 Alex + Happy 来构建解析器,特别是我对学习使用 Alex 的 monad 包装器很感兴趣.我已经看过 Alex 和 Happy<的文档/a> 但对我来说,它们都真的缺乏关于将它们一起使用的任何有用信息.我设法让它们与 basicposn 包装器一起工作,但我对 monad 不知所措.

I'm trying to learn using Alex + Happy to build parser, in particular I'm interested in learning to use the monad wrapper of Alex. I have already looked at the documentation of Alex and Happy but I they are both, for me, really lacking any useful information on using them together. I managed to make them work together with the basic and posn wrappers, but I'm at a loss with monad.

我已经看过关于亚历克斯、快乐和一元词法分析器的不同问题(包括:有没有关于使用 Alex + Happy 构建简单解释器的教程? 但没有一个能够提供一个简单的例子,其中 monad使用.

I have already looked at different question on SO about Alex, Happy and monadic lexers (including: Are there any tutorials on building a simple interpreter using Alex + Happy? but none is able to provide a simple example where monad is used.

大多数在线代码使用带有自定义词法分析器函数的 Happy,或者使用 basicposn Alex 包装器.

Most of the code online uses Happy with a custom lexer function, or uses the basic or posn Alex wrappers.

这是一个类似 ini 的语法的简单词法分析器:

Here's a simple lexer for an ini-like syntax:

{
module IniLexer where
}

%wrapper "monad"



$spaces = [ 	]
$alpha = [a-zA-Z]
$digits = [0-9]
$alnum = [$alpha$digits]


@identifier = $alpha $alnum*

@comment = #.*

@integer = $digits+

@boolean = (true) | (false)

@string = "[^"]*"


:-

@integer    { mkL LInteger }
@boolean    { mkL LBoolean }
@string     { mkL LString }

@identifier  { mkL LIdentifier }

[@identifier] { mkL LSection }

=           { mkL LAssign }

;          { mkL LEndAssign }
@comment    ;
[ 	 
]+  ;


{

data LexemeClass = LInteger | LBoolean | LString | LIdentifier | LSection | LAssign | LEndAssign | LEOF
    deriving (Eq, Show)


mkL :: LexemeClass -> AlexInput -> Int -> Alex Token
mkL c (p, _, _, str) len = let t = take len str
                           in case c of
                                LInteger -> return (IntegerNum ((read t) :: Integer) p)
                                LBoolean -> return (BooleanVal (if t == "true"
                                                                   then True
                                                                   else False
                                                               ) p)
                                LString -> return (StringTxt (take (length t - 2) (drop 1 t)) p)
                                LIdentifier -> return (Identifier t p)
                                LSection -> return (SectionHeader (take (length t - 2) (drop 1 t)) p)
                                LAssign -> return (Assignment p)
                                LEndAssign -> return (EndAssignment p)


-- No idea why I have to write this myself. Documentation doesn't mention it.
alexEOF :: Alex Token
alexEOF = return Eof



data Token = SectionHeader {identifier :: String, position :: AlexPosn} |
             Identifier {name :: String, position :: AlexPosn}          |
             Assignment {position :: AlexPosn}                          |
             EndAssignment {position :: AlexPosn}                       |
             IntegerNum {value :: Integer, position :: AlexPosn}        |
             BooleanVal {istrue :: Bool, position :: AlexPosn}          |
             StringTxt  {text :: String, position :: AlexPosn}          |
             Eof
    deriving (Eq, Show)


}

这是相对的 Happy 解析器:

And here's the relative Happy parser:

{
module Main where

import IniLexer

}



%name parseIniFile
%error {parseError}
%lexer  {alexMonadScan} {AlexEOF}
%monad {Alex}
%tokentype {Token}
%token
    SECTION     {SectionHeader name _ }
    IDENT       {Identifier name _ }
    '='         {Assignment _ }
    INT         {IntegerNum value _ }
    BOOL        {BooleanVal istrue _ }
    STRING      {StringTxt text _ }
    ';'         {EndAssignment _ }


%%


ConfigFile : SequenceOfSections                    {reverse $1}

SequenceOfSections : {- empty -}                   {   []  }
                   | SequenceOfSections Section    {$2 : $1}


Section : SECTION SectionBody                      {Section (identifier $1) (reverse $2)}


SectionBody : {- empty -}        {[]}
            | SectionBody AssignmentLine ';' {$2 : $1}


AssignmentLine : IDENT '=' Value      {(name $1, $3)}

Value : INT         {IntV (value $1)}
      | BOOL        {BoolV (istrue $1)}
      | STRING      {StringV (text $1)}


{

data Value = IntV Integer | BoolV Bool | StringV String
    deriving (Eq, Show)

data Section = Section String [(String, Value)]
    deriving (Eq, Show)

data IniFile = IniFile [Section]
    deriving (Eq, Show)


parseError :: [Token] -> Alex a
parseError t = fail "a"

main = do
    s <- getContents
    print $ parseIniFile $ runAlex s alexMonadScan

}

这会引发很多编译器错误:

Which raises a lot of compiler errors:

[...]
Couldn't match expected type `(AlexReturn t1 -> Alex a0) -> t0'
                with actual type `Alex Token'
    The function `alexMonadScan' is applied to one argument,
    but its type `Alex Token' has none
[...]

我应该如何修改解析器以使用 alexMonadScan?Happy 文档不清楚所有并努力使用任何澄清的例子(或者提供的例子从我的角度来看没有说明问题).

How should I modify the parser to use alexMonadScan? The Happy documentation isn't clear at all and tries hard not to use any clarifying example (or the examples provided fail in clarying from my point of view).

如果需要,我可以发布我的 posn 版本的同一个词法分析器+解析器.

If needed I could post my posn version of this same lexer+parser.

推荐答案

据我所知,您的词法分析器的定义完全没问题.假设那里没有错误,您需要修复的唯一问题是解析器的配置.第一件事是您使用的词法分析器是错误的.虽然该函数是 Alex 词法分析器的接口,但它具有类型

Your lexer's definition is completely fine as far as I can tell. Assuming there are no bugs there, the only problems you need to fix are in your parser's configuration. The first thing is that the lexer you are using is the wrong one. While that function is the interface to the Alex lexer, it has the type

alexMonadScan :: Alex result

但是 Happy 想要的词法分析器是那种类型

But the lexer Happy wants is of type

lexer :: (Token -> P a) -> P a

其中 P 是我们正在使用的 monad.这就是说词法分析器应该在给定一个延续时为我们提供一个 Alex a.我们在这里需要一个简单的包装器:

Where P is the monad we are using. What this is saying is that the lexer should provide us an Alex a when given a continuation. A simple wrapper is what we need here:

lexwrap :: (Token -> Alex a) -> Alex a
lexwrap cont = do
    token <- alexMonadScan
    cont token

或等效

lexwrap = (alexMonadScan >>=)

其次,在 %lexer 指令中使用 alexEOF 会导致解析器在每次输入时都失败.您在那里提供的名称插入到生成代码中 case 语句的分支中,因此您必须使用数据构造函数的名称而不是值 --- 特别是,您需要使用 Alex 将发出的数据构造函数信号EOF.

Second, using alexEOF in the %lexer directive will cause your parser to fail on every input. The name you supply there is inserted in to a branch of a case statement in generated code, so you must use the name of a data constructor rather than a value --- in particular, you need to use the data constructor that Alex will emit to signal EOF.

这使得我们在解析器中的词法分析器行有点不同.

This makes our lexer line in the parser a little different.

%lexer {lexwrap} {Eof}

(作为旁注,这个是你需要自己写alexEOF = return Eof的原因.你在alexEOF里面返回的数据构造函数code> 需要与您标识为 Happy 的数据构造函数进行模式匹配,作为结束文件的那个.Alex 无法知道您想要发出什么,而 Happy 无法知道您选择通过 Alex 发出什么.)

(As a side note, this is the reason that you need to write alexEOF = return Eof yourself. The data constructor you return inside alexEOF needs to pattern-match against the data constructor you identify to Happy as the one that ends the file. Alex has no way of knowing what you want to emit, and Happy has no way of knowing what you chose to emit via Alex.)

现在下一个问题是您的 parseError 类型不正确.仅使用 monad 时,这确实是您需要的类型,但是当您将词法分析器添加到组合中时,您的 parseError 必须具有不同的类型.此外,可能不建议使用 fail,所以这里有一个稍微好一点的定义:

Now the next problem is that your parseError's type is incorrect. When using just a monad, that is indeed the type you need, but when you add a lexer into the mix, your parseError must have a different type. Also, using fail is probably not advised, so here is a slightly better definition:

parseError :: Token -> Alex a
parseError _ = alexError "Why is using happy and alex so hard"

最后,这里定义的main函数有点奇怪.我们想要调用解析器是用 runAlex 调用它.所以这里是它的一个快速包装.传入的字符串就是你要解析的字符串.

Finally, the main function is definied a little strange here. what we want to do to call the parser is to invoke it with runAlex. So here is a quick wrapper for it. The string passed in is the string that you wish to parse.

parse :: String -> Either String [Section]
parse s = runAlex s parseIniFile

函数 parse 的类型由 parseIniFile 的定义决定.这里,它是一个 Alex [Section],因此返回一个 Either String [Section].

The type of the function parse is determined by the parseIniFile's definition. Here, it is an Alex [Section] so an Either String [Section] is returned.

我认为这就是一切.

这篇关于如何在 Happy 中使用 Alex monadic 词法分析器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆