递归解析器 [英] Recursive Parser

查看:133
本文介绍了递归解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用Megaparsec解析像这样的数据结构

I need to parse, using Megaparsec a data structure like

data Foo
    = Simple String
    | Dotted Foo String

我可以用点分隔字母数字字符串.

where I can have alphanumeric strings separated by dots.

例如,应将abc解析为Simple "abc",将abc.def解析为Dotted (Simple "abc") "def".

For example abc should be parsed to Simple "abc" and abc.def to Dotted (Simple "abc") "def".

我现在的解析器就像

fooParser :: Parser Foo
fooParser
    = Simple <$> alphaNum
    <|> do
        foo <- fooParser
        constant "."
        end <- alphaNum
        pure $ Dotted foo end

这对于Simple来说很好用,但是它不会解析任何Dotted,因为第一个选项总是成功解析字符串的第一部分.

This works fine for Simple, but it does not parse any Dotted, because the first option always succeeds parsing the first piece of the string.

哪个是修复我的解析器的最佳选择?

Which is the best option to fix my parser?

推荐答案

它不解析任何Dotted,因为第一个选项总是成功解析字符串的第一部分.

it does not parse any Dotted, because the first option always succeeds parsing the first piece of the string.

通过更改替代方案的顺序,可以很容易地解决该问题.通常,只要您有一个总是可以匹配的替代方案,该替代方案就必须排在最后.

That problem can be fixed easily enough by changing the order of your alternatives. Generally, whenever you have an alternative that can always match, that alternative must come last.

但是,这只会导致您遇到下一个问题:您的Dotted解析器是左递归的,而parsec不支持该解析器,这意味着一旦到达实际位置,它将导致无限递归.

However this will only lead you to your next problem: Your Dotted parser is left-recursive, which parsec does not support, meaning it will lead to an infinite recursion once its actually reached.

通常,当我们想将左递归语法与不能处理左递归的解析算法一起使用时,我们用重复替换递归,然后对结果列表进行左折.因此,给出原始语法:

Generally when we want to use a left-recursive grammar with a parsing algorithm that does not handle left-recursion, we replace the recursion with repetition and then perform a left-fold on the resulting list. So given the original grammar:

foo ::= alphanum
      | foo "." alphanum

我们可以像这样用重复来重写它:

We can rewrite it using repetition like this:

foo ::= alphanum ("." alphanum)*

现在,对Parsec的最直接翻译将对*使用many,然后将结果列表左对齐.但是,我们可能会注意到,模式rule ("seperator" rule)*可以更简单地由sepBy1匹配.所以这给了我们

Now the most direct translation to Parsec would use many for the * and then left-fold the resulting list. However, we might notice that the pattern rule ("seperator" rule)* can be more simply matched by sepBy1. So this gives us:

fooParser =
  do
    first : rest <- sepBy1 alphanum $ constant "."
    return $ foldl Dotted (Simple first) rest

这篇关于递归解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆