什么原因导致快乐抛出一个解析错误? [英] What causes Happy to throw a parse error?

查看:183
本文介绍了什么原因导致快乐抛出一个解析错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Alex写了一个词法分析器,我试图把它和一个用Happy编写的解析器联系起来。我会尽我所能来总结我的问题,而不会粘贴大量的代码。



我从我的词法分析器的单元测试中知道字符串 \x7是lexed:

  [TokenNonPrint'\x7',TokenEOF] 

我的令牌类型(由词法分析器吐出)是令牌。我已经按照 lexWrap alexEOF 20315739 / how-to-use-an-alex-monadic-lexer-with-happy> here ,它给了我下面的头文件和令牌声明:

 %name parseTokens 
%tokentype {令牌}
%lexer {lexWrap} {alexEOF}
%monad {Alex}
%错误{parseError}

令牌
NONPRINT {TokenNonPrint $$}
PLAIN {TokenPlain $$}

我使用以下方法调用解析器+词法分析器组合:

  parseExpr ::字符串 - >或者String [Expr] 
parseExpr s = runAlex s parseTokens

这里是我的第一个制作:

  exprs :: {[Expr]} 
exprs
:{ - empty - } {跟踪exprs 30[]}
| exprs expr {traceexprs 31$ $ 2:$ 1}

nonprint :: {Cmd}
:NONPRINT {NonPrint $ parseNonPrint $ 1}

expr :: {Expr}
expr
:nonprint {traceexpr 44$ Cmd $ $ 1}
| PLAIN {traceexpr 37$ Plain $ 1}

我将省略 Expr NonPrint ,因为它们很长,只有构造函数 Cmd NonPrint 在这里。

  parseNonPrint :: Char  - > NonPrint 
parseNonPrint'\x7'= Bell

另外,我的错误处理函数看起来像:

  parseError :: Token  - > Alex a 
parseError tokens = error(Error processing token:++ show tokens)

这样写的,我期望下面的hspec测试通过:

  parseExpr\x7`shouldBe` Right [Cmd (NonPrint Bell)] 

但是,我看到exprs 30 print once (尽管我正在运行5个不同的单元测试)并且我的所有测试都是 parseExpr return 右键[] 。我不明白为什么会出现这种情况,但我更改了 exprs 制作以防止它:

 exprs :: {[Expr]} 
exprs
:expr {traceexprs 30[$ 1]}
| exprs expr {traceexprs 31$ $ 2:$ 1}

现在我所有的测试都失败了第一个令牌 - parseExpr\x7失败:

 未捕获的异常:ErrorCall(错误处理令牌:TokenNonPrint'\ a')

我完全困惑,因为我期望解析器采用路径 exprs - > expr - >非打印 - > NONPRINT 并成功。我不明白为什么这个输入会使解析器处于错误状态。 trace 语句中没有一个命中(优化过)?



我做错了什么?

解决方案

事实证明,这个错误的原因是无害的行

 %lexer {lexWrap} {alexEOF} 

(不幸的是,谷歌的一个顶级搜索结果,例如将Alex用作单身词法分析器与Happy),修正方法是将其更改为以下内容:

 %lexer {lexWrap} {TokenEOF} 

我必须深入到生成的代码来发现问题,它是由%tokens 指令派生的代码引起的,如下所示(我注释掉了所有代码我的令牌声明除了 TokenNonPrint ,同时试图追踪错误):

  happyNewToken action sts stk 
= lexWrap(\tk - >
让cont i = happyDoAction i tk action sts stk in
case tk of {
alexEOF - > happyDoAction 2#tk action sts stk; - !!!!
TokenNonPrint happy_dollar_dollar - >续1#;
_ - > happyError'tk
})

显然,Happy将%令牌指令到模式匹配的一个分支中。它 通过插入一个值的名称 alexEOF ,而不是一个数据构造函数,
> TokenEOF
,case语句的该分支具有将名称 alexEOF 重新绑定到传递给 lexWrap ,阴影原始绑定并将case语句短路,以便每次都碰到EOF规则,这会以某种方式导致Happy进入错误状态。

错误不会被类型系统捕获,因为标识符 alexEOF (或 TokenEOF )不会出现在生成的代码中的任何其他地方。滥用%lexer 指令会导致GHC发出警告,但是,由于警告出现在生成的代码中,因此无法区分它与其他所有无害警告代码会抛出。


I've written a lexer in Alex and I'm trying to hook it up to a parser written in Happy. I'll try my best to summarize my problem without pasting huge chunks of code.

I know from my unit tests of my lexer that the string "\x7" is lexed to:

[TokenNonPrint '\x7', TokenEOF]

My token type (spit out by the lexer), is Token. I've defined lexWrap and alexEOF as described here, which gives me the following header and token declarations:

%name parseTokens 
%tokentype { Token }
%lexer { lexWrap } { alexEOF }
%monad { Alex }
%error { parseError }

%token
  NONPRINT {TokenNonPrint $$}
  PLAIN { TokenPlain $$ }

I invoke the parser+lexer combo with the following:

parseExpr :: String -> Either String [Expr]
parseExpr s = runAlex s parseTokens

And here are my first few productions:

exprs :: { [Expr] }
exprs
  : {- empty -} { trace "exprs 30" [] }
  | exprs expr { trace "exprs 31" $ $2 : $1 }

nonprint :: { Cmd }
  : NONPRINT { NonPrint $ parseNonPrint $1}

expr :: { Expr }
expr
  : nonprint {trace "expr 44" $ Cmd $ $1}
  | PLAIN { trace "expr 37" $ Plain $1 }

I'll leave out the datatype declarations of Expr and NonPrint since they're long and only the constructors Cmd and NonPrint matter here. The function parseNonPrint is defined at the bottom of Parse.y as:

parseNonPrint :: Char -> NonPrint
parseNonPrint '\x7' = Bell

Also, my error handling function looks like:

parseError :: Token -> Alex a
parseError tokens = error ("Error processing token: " ++ show tokens)

Written like this, I expect the following hspec test to pass:

parseExpr "\x7" `shouldBe` Right [Cmd (NonPrint Bell)]

But instead, I see "exprs 30" print once (even though I'm running 5 different unit tests) and all of my tests of parseExpr return Right []. I don't understand why that would be the case, but I changed the exprs production to prevent it:

exprs :: { [Expr] }
exprs
  : expr { trace "exprs 30" [$1] }
  | exprs expr { trace "exprs 31" $ $2 : $1 }

Now all of my tests fail on the first token they hit --- parseExpr "\x7" fails with:

uncaught exception: ErrorCall (Error processing token: TokenNonPrint '\a')

And I'm thoroughly confused, since I would expect the parser to take the path exprs -> expr -> nonprint -> NONPRINT and succeed. I don't see why this input would put the parser in an error state. None of the trace statements are hit (optimized away?).

What am I doing wrong?

解决方案

It turns out the cause of this error was the innocuous line

%lexer { lexWrap } { alexEOF }

which was recommended by the linked question about using Alex with Happy (unfortunately, one of the top Google results for queries like "using Alex as a monadic lexer with Happy). The fix is to change it to the following:

%lexer { lexWrap } { TokenEOF }

I had to dig in to the generated code to uncover the issue. It is caused by the code derived from the %tokens directive, which looks as follows (I commented out all of my token declarations except for TokenNonPrint while trying to track down the error):

happyNewToken action sts stk
    = lexWrap(\tk -> 
    let cont i = happyDoAction i tk action sts stk in
    case tk of {
    alexEOF -> happyDoAction 2# tk action sts stk; -- !!!!
    TokenNonPrint happy_dollar_dollar -> cont 1#;
    _ -> happyError' tk
    })

Evidently, Happy transforms each line of the %tokens directive in to one branch of a pattern match. It also inserts a branch for whatever was identified to it as the EOF token in the %lexer directive.

By inserting the name of a value, alexEOF, rather than a data constructor, TokenEOF, this branch of the case statement has the effect of re-binding the name alexEOF to whatever token was passed in to lexWrap, shadowing the original binding and short-circuiting the case statement so that it hits the EOF rule every time, which somehow results in Happy entering an error state.

The mistake isn't caught by the type system, since the identifier alexEOF (or TokenEOF) doesn't appear anywhere else in the generated code. Misusing the %lexer directive like this will cause GHC to emit a warning, but, since the warning appears in generated code, it's impossible to distinguish it from all of the other harmless warnings the code throws out.

这篇关于什么原因导致快乐抛出一个解析错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆