如何在Happy中使用Alex monadic词法分析器? [英] How to use an Alex monadic lexer with Happy?

查看:229
本文介绍了如何在Happy中使用Alex monadic词法分析器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图学习使用Alex + Happy来构建解析器,特别是我有兴趣学习使用Alex的 monad 包装器。我已经看过Alex的文档和 Happy 但是对于我来说,他们都是真的 缺乏关于一起使用它们的任何有用信息。我设法使它们与基本 posn 包装协同工作,但是我在<$ c c $ c> monad 。



我已经看过关于Alex的不同问题,关于Alex,Happy和monadic lexers(包括:是否有任何教程建立一个简单的解释器使用亚历克斯+ Happy?,但没有一个能够提供一个简单的例子,其中使用 monad



大多数的在线代码使用Happy自定义词法分析器函数,或使用基本 posn Alex包装器。



下面是一个类似ini的语法的简单词法分析器:

  {
模块IniLexer其中
}

%wrappermonad



$ spaces = [\ \t]
$ alpha = [a-zA-Z]
$ digits = [0-9]
$ alnum = [$ alpha $ digits]


@iden tifier = $ alpha $ alnum *

@comment = \#。*

@integer = $ digits +

@boolean =(true) | (假)

@string = \[^ \] * \


: -

@整数{mkL LInteger}
@boolean {mkL LBoolean}
@string {mkL LString}

@identifier {mkL LIdentifier}

\ [ @标识符'] {mkL LSection}

= {mkL LAssign}

\;; {mkL LEndAssign}
@comment;
[\\ \ \\ t \\\
] +;


{

数据LexemeClass = LInteger | LBoolean | LString | LIdentifier | LSection | LAssign | LEndAssign | LEOF
导出(Eq,Show)


mkL :: LexemeClass - > AlexInput - > Int - > Alex令牌
mkL c(p,_, ,str)len = let t =在
的情况下使用len str
LInteger - > return(IntegerNum((read t):: Integer)p)
LBoolean - > return (BooleanVal(如果t ==true
那么True
else False
)p)
LString - > return(StringTxt(take(length t - 2)(drop 1 t))p)
LIdentifier - >返回(标识符t p)
LSection - >返回(SectionHeader(take(length t - 2)(drop 1 t))p)
LAssign - >返回(作业p)
LendAssign - >返回(EndAssignment p)


- 不知道为什么我必须自己写这个。文档没有提到它。
alexEOF :: Alex令牌
alexEOF =返回Eof



数据Token = SectionHeader {identifier :: String,position :: AlexPosn} |
标识符{name :: String,position :: AlexPosn} |
赋值{position :: AlexPosn} |
EndAssignment {position :: AlexPosn} |
IntegerNum {value :: Integer,position :: AlexPosn} |
BooleanVal {istrue :: Bool,position :: AlexPosn} |
StringTxt {text :: String,position :: AlexPosn} |
Eof
派生(Eq,Show)


}

下面是相对的Happy解析器:

  {
module Main其中

import IniLexer

}



%name parseIniFile
%error {parseError}
%lexer { alexMONadScan} {AlexEOF}
%monad {Alex}
tokentype {令牌}
令牌
SECTION {SectionHeader名称_}
IDENT {标识符名称}
'='{Assignment _}
INT {IntegerNum value _}
BOOL {BooleanVal istrue _}
STRING {StringTxt text _}
';'{EndAssignment _ }


%%


ConfigFile:SequenceOfSections {reverse $ 1}

SequenceOfSections:{ - empty - } { []}
| SequenceOfSections Section {$ 2:$ 1}


Section Section Body {Section(identifier $ 1)(reverse $ 2)}


SectionBody:{ - 空 - } {[]}
| SectionBody AssignmentLine';'{$ 2:$ 1}


AssignmentLine:IDENT'='值{(名称$ 1,$ 3)}

值:INT {IntV (价值$ 1)}
| BOOL {BoolV(istrue $ 1)}
| STRING {StringV(text $ 1)}


{

data Value = IntV Integer | BoolV Bool | StringV String
deriving(Eq,Show)

data Section = Section String [(String,Value)]
deriving(Eq,Show)

数据IniFile = IniFile [部分]
派生(Eq,Show)


parseError :: [令牌] - > Alex a
parseError t =失败a

main = do
s< - getContents
print $ parseIniFile $ runAlex s alexMonadScan

}

这引发了很多编译错误:

  [...] 
无法匹配预期的类型`(AlexReturn t1 - > Alex a0) - > t0'
,实际类型为'Alex Token'
函数`alexMonadScan'适用于一个参数
,但其类型'Alex Token'没有
[...]

如何修改解析器以使用 alexMonadScan
Happy 文档isn一点都不清楚,努力不做 来使用任何澄清的例子(或者从我的观点来看,提供的例子不合适)



如果需要,我可以发布我的 posn 版本的这个相同的词法分析器+解析器。

解决方案

  alexMonadScan :: Alex结果

code>

但是,快乐要的词法分析器是

  lexer ::(Token  - > P a) - > P a 

其中 P 是monad正在使用。这就是说,词法分析器在给予延续时应该给我们一个 Alex a 。我们需要一个简单的包装:

  lexwrap ::(Token  - > Alex a) - > Alex a 
lexwrap cont = do
令牌< - alexMonadScan
cont令牌

或等同于

  lexwrap =(alexMonadScan>> =)

其次,在%lexer alexEOF $ c>指令将导致解析器在每个输入上失败。您在那里提供的名称将插入到生成代码中的case语句的分支中,因此您必须使用数据构造函数的名称而不是值 - 特别是,您需要使用Alex将发出的数据构造函数发信号给EOF。



这使得我们在解析器中的词法分析器有点不同。

 %lexer {lexwrap} {EOF} 



<注意> 是您需要自己编写 alexEOF = return Eof 的原因。您在> alexEOF 中返回的数据构造函数需要与您标识为Happy的数据构造函数进行模式匹配,作为结束文件的数据构造函数,Alex无法知道您想要发出的内容,而Happy无法知道您选择通过Alex发出的内容。) p>

现在接下来的问题是您的parseError类型不正确。当只使用monad时,这确实是你需要的类型,但是当你将一个词法分析器添加到混合中时,你的parseError必须具有不同的类型。此外,使用失败可能不建议,所以这里有一个更好的定义:

  parseError :: Token  - > Alex a 
parseError _ = alexError为什么要用happy和alex这么辛苦

最后,主要功能在这里定义有点奇怪。我们想要调用解析器的方法是使用runAlex调用它。所以这里是一个快速的包装。

  parse :: String  - >传入的字符串是您希望解析的字符串。或者String [Section] 
parse s = runAlex s parseIniFile

函数解析的类型由parseIniFile的定义决定。在这里,它是 Alex [部分] ,所以返回任一字符串[部分]



我认为这就是一切。


I'm trying to learn using Alex + Happy to build parser, in particular I'm interested in learning to use the monad wrapper of Alex. I have already looked at the documentation of Alex and Happy but I they are both, for me, really lacking any useful information on using them together. I managed to make them work together with the basic and posn wrappers, but I'm at a loss with monad.

I have already looked at different question on SO about Alex, Happy and monadic lexers (including: Are there any tutorials on building a simple interpreter using Alex + Happy? but none is able to provide a simple example where monad is used.

Most of the code online uses Happy with a custom lexer function, or uses the basic or posn Alex wrappers.

Here's a simple lexer for an ini-like syntax:

{
module IniLexer where
}

%wrapper "monad"



$spaces = [\ \t]
$alpha = [a-zA-Z]
$digits = [0-9]
$alnum = [$alpha$digits]


@identifier = $alpha $alnum*

@comment = \#.*

@integer = $digits+

@boolean = (true) | (false)

@string = \"[^\"]*\"


:-

@integer    { mkL LInteger }
@boolean    { mkL LBoolean }
@string     { mkL LString }

@identifier  { mkL LIdentifier }

\[@identifier\] { mkL LSection }

=           { mkL LAssign }

\;          { mkL LEndAssign }
@comment    ;
[\ \t \n]+  ;


{

data LexemeClass = LInteger | LBoolean | LString | LIdentifier | LSection | LAssign | LEndAssign | LEOF
    deriving (Eq, Show)


mkL :: LexemeClass -> AlexInput -> Int -> Alex Token
mkL c (p, _, _, str) len = let t = take len str
                           in case c of
                                LInteger -> return (IntegerNum ((read t) :: Integer) p)
                                LBoolean -> return (BooleanVal (if t == "true"
                                                                   then True
                                                                   else False
                                                               ) p)
                                LString -> return (StringTxt (take (length t - 2) (drop 1 t)) p)
                                LIdentifier -> return (Identifier t p)
                                LSection -> return (SectionHeader (take (length t - 2) (drop 1 t)) p)
                                LAssign -> return (Assignment p)
                                LEndAssign -> return (EndAssignment p)


-- No idea why I have to write this myself. Documentation doesn't mention it.
alexEOF :: Alex Token
alexEOF = return Eof



data Token = SectionHeader {identifier :: String, position :: AlexPosn} |
             Identifier {name :: String, position :: AlexPosn}          |
             Assignment {position :: AlexPosn}                          |
             EndAssignment {position :: AlexPosn}                       |
             IntegerNum {value :: Integer, position :: AlexPosn}        |
             BooleanVal {istrue :: Bool, position :: AlexPosn}          |
             StringTxt  {text :: String, position :: AlexPosn}          |
             Eof
    deriving (Eq, Show)


}

And here's the relative Happy parser:

{
module Main where

import IniLexer

}



%name parseIniFile
%error {parseError}
%lexer  {alexMonadScan} {AlexEOF}
%monad {Alex}
%tokentype {Token}
%token
    SECTION     {SectionHeader name _ }
    IDENT       {Identifier name _ }
    '='         {Assignment _ }
    INT         {IntegerNum value _ }
    BOOL        {BooleanVal istrue _ }
    STRING      {StringTxt text _ }
    ';'         {EndAssignment _ }


%%


ConfigFile : SequenceOfSections                    {reverse $1}

SequenceOfSections : {- empty -}                   {   []  }
                   | SequenceOfSections Section    {$2 : $1}


Section : SECTION SectionBody                      {Section (identifier $1) (reverse $2)}


SectionBody : {- empty -}        {[]}
            | SectionBody AssignmentLine ';' {$2 : $1}


AssignmentLine : IDENT '=' Value      {(name $1, $3)}

Value : INT         {IntV (value $1)}
      | BOOL        {BoolV (istrue $1)}
      | STRING      {StringV (text $1)}


{

data Value = IntV Integer | BoolV Bool | StringV String
    deriving (Eq, Show)

data Section = Section String [(String, Value)]
    deriving (Eq, Show)

data IniFile = IniFile [Section]
    deriving (Eq, Show)


parseError :: [Token] -> Alex a
parseError t = fail "a"

main = do
    s <- getContents
    print $ parseIniFile $ runAlex s alexMonadScan

}

Which raises a lot of compiler errors:

[...]
Couldn't match expected type `(AlexReturn t1 -> Alex a0) -> t0'
                with actual type `Alex Token'
    The function `alexMonadScan' is applied to one argument,
    but its type `Alex Token' has none
[...]

How should I modify the parser to use alexMonadScan? The Happy documentation isn't clear at all and tries hard not to use any clarifying example (or the examples provided fail in clarying from my point of view).

If needed I could post my posn version of this same lexer+parser.

解决方案

Your lexer's definition is completely fine as far as I can tell. Assuming there are no bugs there, the only problems you need to fix are in your parser's configuration. The first thing is that the lexer you are using is the wrong one. While that function is the interface to the Alex lexer, it has the type

alexMonadScan :: Alex result

But the lexer Happy wants is of type

lexer :: (Token -> P a) -> P a

Where P is the monad we are using. What this is saying is that the lexer should provide us an Alex a when given a continuation. A simple wrapper is what we need here:

lexwrap :: (Token -> Alex a) -> Alex a
lexwrap cont = do
    token <- alexMonadScan
    cont token

or equivalently

lexwrap = (alexMonadScan >>=)

Second, using alexEOF in the %lexer directive will cause your parser to fail on every input. The name you supply there is inserted in to a branch of a case statement in generated code, so you must use the name of a data constructor rather than a value --- in particular, you need to use the data constructor that Alex will emit to signal EOF.

This makes our lexer line in the parser a little different.

%lexer {lexwrap} {Eof}

(As a side note, this is the reason that you need to write alexEOF = return Eof yourself. The data constructor you return inside alexEOF needs to pattern-match against the data constructor you identify to Happy as the one that ends the file. Alex has no way of knowing what you want to emit, and Happy has no way of knowing what you chose to emit via Alex.)

Now the next problem is that your parseError's type is incorrect. When using just a monad, that is indeed the type you need, but when you add a lexer into the mix, your parseError must have a different type. Also, using fail is probably not advised, so here is a slightly better definition:

parseError :: Token -> Alex a
parseError _ = alexError "Why is using happy and alex so hard"

Finally, the main function is definied a little strange here. what we want to do to call the parser is to invoke it with runAlex. So here is a quick wrapper for it. The string passed in is the string that you wish to parse.

parse :: String -> Either String [Section]
parse s = runAlex s parseIniFile

The type of the function parse is determined by the parseIniFile's definition. Here, it is an Alex [Section] so an Either String [Section] is returned.

I think that's everything.

这篇关于如何在Happy中使用Alex monadic词法分析器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆