递归解析器 [英] Recursive Parser
问题描述
我需要使用Megaparsec解析像这样的数据结构
I need to parse, using Megaparsec a data structure like
data Foo
= Simple String
| Dotted Foo String
我可以用点分隔字母数字字符串.
where I can have alphanumeric strings separated by dots.
例如,应将abc
解析为Simple "abc"
,将abc.def
解析为Dotted (Simple "abc") "def"
.
For example abc
should be parsed to Simple "abc"
and abc.def
to Dotted (Simple "abc") "def"
.
我现在的解析器就像
fooParser :: Parser Foo
fooParser
= Simple <$> alphaNum
<|> do
foo <- fooParser
constant "."
end <- alphaNum
pure $ Dotted foo end
这对于Simple
来说很好用,但是它不会解析任何Dotted
,因为第一个选项总是成功解析字符串的第一部分.
This works fine for Simple
, but it does not parse any Dotted
, because the first option always succeeds parsing the first piece of the string.
哪个是修复我的解析器的最佳选择?
Which is the best option to fix my parser?
推荐答案
它不解析任何Dotted,因为第一个选项总是成功解析字符串的第一部分.
it does not parse any Dotted, because the first option always succeeds parsing the first piece of the string.
通过更改替代方案的顺序,可以很容易地解决该问题.通常,只要您有一个总是可以匹配的替代方案,该替代方案就必须排在最后.
That problem can be fixed easily enough by changing the order of your alternatives. Generally, whenever you have an alternative that can always match, that alternative must come last.
但是,这只会导致您遇到下一个问题:您的Dotted
解析器是左递归的,而parsec不支持该解析器,这意味着一旦到达实际位置,它将导致无限递归.
However this will only lead you to your next problem: Your Dotted
parser is left-recursive, which parsec does not support, meaning it will lead to an infinite recursion once its actually reached.
通常,当我们想将左递归语法与不能处理左递归的解析算法一起使用时,我们用重复替换递归,然后对结果列表进行左折.因此,给出原始语法:
Generally when we want to use a left-recursive grammar with a parsing algorithm that does not handle left-recursion, we replace the recursion with repetition and then perform a left-fold on the resulting list. So given the original grammar:
foo ::= alphanum
| foo "." alphanum
我们可以像这样用重复来重写它:
We can rewrite it using repetition like this:
foo ::= alphanum ("." alphanum)*
现在,对Parsec的最直接翻译将对*
使用many
,然后将结果列表左对齐.但是,我们可能会注意到,模式rule ("seperator" rule)*
可以更简单地由sepBy1
匹配.所以这给了我们
Now the most direct translation to Parsec would use many
for the *
and then left-fold the resulting list. However, we might notice that the pattern rule ("seperator" rule)*
can be more simply matched by sepBy1
. So this gives us:
fooParser =
do
first : rest <- sepBy1 alphanum $ constant "."
return $ foldl Dotted (Simple first) rest
这篇关于递归解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!