FParsec中的尾递归 [英] Tail-recursion in FParsec
问题描述
我遇到了具有两个递归分支的解析器的问题.为了更轻松地演示问题,我使用了 我对 如何将解析器重写为尾递归? 当我尝试为类似Lisp的语言编写解析器并使用真实的基准测试它时,我意识到了这个问题.我有相互递归类型的 FParsec的内置解析器组合器通常不允许尾部递归解析器实现,主要是因为错误处理的实现方式./p>
这意味着FParsec解析器的递归深度受堆栈大小的限制-大多数递归下降解析器就是这种情况.当然,这不会影响使用 .NET中的默认堆栈大小通常足以使用FParsec解析人为生成的输入(采用指定格式)(假设您使用内置组合器解析序列).但是,具有深层嵌套结构(例如5000级深s表达式)的机器生成的输入很容易导致堆栈溢出. 如果您事先知道输入中的嵌套深度被限制为合理的值,则可以简单地低-级别 FParsec的 API ,您显然必须the article written by Luca Bolognese as the example:pApplication
感兴趣,因为它们由两个pExpr
组成,而这两个反过来也可能是pApplication
.在以下基准测试中,解析器的堆栈空间不足:let generateString level =
let rec loop i =
seq {
if i < level then
yield "("
yield! loop level
yield " "
yield! loop (i+1)
yield ")"
else
yield "(x x)"
}
loop 0 |> String.concat ""
let N = 5000
let s = generateString N;;
let _ = fparseString s;;
Term
和VarBinding
和一个表现与上述pApplication
相同问题的let
解析器.我的解析器是在github上,以防有关非尾部的分析错误-递归问题.many
,sepBy
,chainl
等进行的序列解析,因为这些FParsec组合器都具有恒定栈空间使用的实现.
<expression> ::= <name> | <function> | <application>
<name> ::= nonblank character sequence
<function> ::= \ <name> . <body>
<body> ::= <expression>
<application> ::= ( <function expression> <argument expression> )
<function expression> ::= <expression>
<argument expression> ::= <expression>
The parser in the article is quite concise:
let ws = " \t\n"
let specialChars = ".)(\\\n"
let pWs = spaces
let pName = manyChars (noneOf (ws + specialChars)) |>> EName
let pExpr, pExprRef = createParserForwardedToRef<Expression, Unit>()
let curry2 f a b = f(a,b)
let pFunction = pchar '\\' >>. pipe2 pName (pchar '.' >>. pExpr) (curry2 Function)
let pApplication = pchar '(' >>. pipe2 pExpr (pWs >>. pExpr) (curry2 Application)
.>> pWs .>> pchar ')'
do pExprRef := pFunction <|> pApplication <|> pName
let pExpressions = sepBy pExpr spaces1
let fparseString text =
match run pExpressions text with
| Success(result, _, _) -> result
| Failure(errorMsg, _, _) -> failwith (sprintf "Failure: %s" errorMsg)
I'm interested in pApplication
since they consist of two pExpr
s which in turn could be pApplication
s too. The parser runs out of stack space in the following benchmark:
let generateString level =
let rec loop i =
seq {
if i < level then
yield "("
yield! loop level
yield " "
yield! loop (i+1)
yield ")"
else
yield "(x x)"
}
loop 0 |> String.concat ""
let N = 5000
let s = generateString N;;
let _ = fparseString s;;
How can I rewrite the parser to be tail-recursive?
I recognized the problem when trying to write a parser for a Lisp-like language and test it with real benchmarks. I have Term
and VarBinding
which are mutually recursive types and a let
parser which exhibits the same issue as pApplication
above. My parser is on github in case the analysis is wrong regarding the not tail-recursive problem.
The built-in parser combinators of FParsec generally don't allow for a tail-recursive parser implementation, mainly because of the way the error handling is implemented.
This means that the recursion depth of an FParsec parser is limited by the stack size -- as is the case for most recursive descent parsers. Of course, this doesn't affect the parsing of sequences with many
, sepBy
, chainl
etc., because these FParsec combinators all have implementations with constant stack space usage.
The default stack size in .NET is usually more than enough for parsing human-generated input in well-specified formats with FParsec (assuming you parse sequences with the built-in combinators). However, machine-generated input with a deeply nested structure (like your 5000 level deep s-expressions) can easily lead to a stack overflow.
If you know in advance that the nesting depth in your input is limited to a reasonable value, you could simply increase the stack size to an appropriate value. Hopefully this is true for your benchmark data.
Otherwise you may have to implement a special purpose parser function for the recursive element(s) of your grammar. You would need to implement this function using the low-level API of FParsec and you would obviously have to implement this function such that it uses the heap instead of the stack for keeping track of the nesting context and intermediate parsing results.
这篇关于FParsec中的尾递归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!