Megaparsec:无法解析递归算术字符串 [英] Megaparsec: Unable to parse recursive arithmetic string

查看:144
本文介绍了Megaparsec:无法解析递归算术字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Megaparsec开发一个小型解析器,并尝试解析算术.

I'm working on a small parser using Megaparsec and trying to parse arithmetic.

-- Arithmetic expressions
data Aexp = N Num 
            | V Var 
            | Mult Aexp Aexp
            | Add Aexp Aexp 
            | Sub Aexp Aexp 
             deriving (Show, Eq, Read)


arithParser :: Parser Aexp
arithParser = V <$> strParser
            <|> N <$> numParser
            <|> Mult <$> arithParser <* tok "*" <*> arithParser
--boolParser :: Parser Bexp


strParser :: Parser Var
strParser = tok "\"" *> some (noneOf ("\n\r\"=[]{},:")) <* tok "\""

numParser :: Parser Num
numParser = (some (oneOf ['0' .. '9']) >>= return . read) <* whitespace

如果我运行命令Parse arithParser "5*5" "5*5",它将仅返回Right (N 5),而应返回Mult(N 5) (N 5).由于arithParser中的优先级.但是,如果我更改顺序,则似乎会陷入无限循环并崩溃.

If I run the command Parse arithParser "5*5" "5*5" it just returns Right (N 5), where it should return Mult(N 5) (N 5). Because of the precedence in the arithParser. But if I change the order then it seems to go into an infinite loop and crash.

不确定我在这里做错了什么,我们将不胜感激.

Not sure what I'm doing wrong here, any help would be appreciated.

推荐答案

Parsec尝试使用<|>的左替代项,然后再尝试使用右替代项.如果左选择成功,那么它将不会理会右选择.因此,在这种情况下,当输入5*5作为输入时,Parsec的过程如下所示:

Parsec tries the left alternative of <|> before it tries the right one. If the left alternative succeeds then it won't bother with the right one. So in this instance, when fed the input 5*5, Parsec's process looks like this:

  1. 尝试V <$> strParser. strParsertok "\""开头,但是输入字符串不是以'"'开头,因此strParser失败.
  2. 尝试N <$> numParser. numParser成功解析了数字5,因此Parsec仅返回N 5.
  3. 完成!无需尝试第三种选择.
  1. Try V <$> strParser. strParser begins with tok "\"", but the input string doesn't begin with '"' so strParser fails.
  2. Try N <$> numParser. numParser successfully parses the number 5, so Parsec just returns N 5.
  3. Done! No need to try the third alternative.


因此,我们可以尝试通过将Mult选项上移到顶部并包裹在try中来修补此解析器,以便它可以回溯并尝试numParserstrParser(如果输入结果不是)成为乘法.


So we can attempt to patch this parser up by moving the Mult option up to the top, wrapped in a try so that it can backtrack and try numParser or strParser if the input turns out not to be a multiplication.

arithParser :: Parser Aexp
arithParser = try (Mult <$> arithParser <* tok "*" <*> arithParser)
            <|> N <$> numParser
            <|> V <$> strParser

此解析器还有另一个更微妙的问题.让我们按照上述步骤进行操作.

This parser has another, more subtle problem. Let's walk through the steps, as above.

  1. 尝试try (Mult <$> arithParser <* tok "*" <*> arithParser).此解析器以arithParser开头,因此递归调用arithParser.
  2. 尝试try (Mult <$> arithParser <* tok "*" <*> arithParser).此解析器以arithParser开头,因此递归调用arithParser.
  3. 尝试try (Mult <$> arithParser <* tok "*" <*> arithParser).此解析器以arithParser开头,因此递归调用arithParser.
  4. ...
  1. Try try (Mult <$> arithParser <* tok "*" <*> arithParser). This parser begins with arithParser, so recursively call arithParser.
  2. Try try (Mult <$> arithParser <* tok "*" <*> arithParser). This parser begins with arithParser, so recursively call arithParser.
  3. Try try (Mult <$> arithParser <* tok "*" <*> arithParser). This parser begins with arithParser, so recursively call arithParser.
  4. ...

这是一个无限循环. Parsec无法处理左递归语法.您必须设计解析器,以使其在递归调用之前至少消耗一个令牌.这样做的一种常见方法是修整"语法:

It's an infinite loop. Parsec can't handle left-recursive grammars. You have to design your parser so that it consumes at least one token before a recursive call. One common way of doing this is to "flatten out" your grammar:

expr, term :: Parser AExp
expr = do
    n <- term
    rest <- optional $ tok "*" *> expr
    return $ maybe n (Mult n) rest
term = N <$> numParser
    <|> V <$> strParser
    <|> parenthesised expr

parenthesised = between (char '(') (char ')')

在这里,我将解析器拆分为一个解析器,该解析器解析任意expr-一个term(可选),后跟一个乘法符号和一个被乘数expr-以及一个解析单个term s-数字的解析器,字符串和带括号的表达式.现在可以递归调用expr了-expr中的一个仅在您解析了term(总是消耗输入)之后才发生,而term中的一个仅在您解析了开头之后才发生括号.

Here I've split up the parser into one which parses an arbitrary expr - a term optionally followed by a multiplication symbol and a multiplicand expr - and one which parses single terms - numbers, strings, and parenthesised expressions. The recursive calls to expr are OK now - the one inside expr happens only after you've parsed a term (which always consumes input) and the one inside term happens only after you've parsed an opening parenthesis.

请注意,expr具有类似列表的结构:它解析单个内容,然后可能包含许多内容.通常,您应该考虑解析器消耗了输入令牌的线性输入流,因此列表形解析器往往比树形解析器更有效也就不足为奇了.

Note that expr has a list-like structure: it parses a single thing possibly followed by many things. In general you should think of parsers consuming a linear input stream of input tokens, so it's not surprising that list-shaped parsers tend to be more effective than tree-shaped ones.

Text.Megaparsec.Expr 模块包含用于打包此模式并使用任意优先级和固定性规则解析表达式的函数.

The Text.Megaparsec.Expr module contains functions which package up this pattern and parse expressions with arbitrary precedence and fixity rules.

expr = makeExprParser term [[InfixR $ tok "*" $> Mult]]

这篇关于Megaparsec:无法解析递归算术字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆