FParsec中的尾递归 [英] Tail-recursion in FParsec

查看:73
本文介绍了FParsec中的尾递归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了具有两个递归分支的解析器的问题.为了更轻松地演示问题,我使用了

我对pApplication感兴趣,因为它们由两个pExpr组成,而这两个反过来也可能是pApplication.在以下基准测试中,解析器的堆栈空间不足:

let generateString level =
    let rec loop i =
        seq {
                if i < level then
                    yield "("
                    yield! loop level
                    yield " "
                    yield! loop (i+1)
                    yield ")"
                else 
                    yield "(x x)"
        }
    loop 0 |> String.concat ""

let N = 5000
let s = generateString N;; 
let _ = fparseString s;;

如何将解析器重写为尾递归?

当我尝试为类似Lisp的语言编写解析器并使用真实的基准测试它时,我意识到了这个问题.我有相互递归类型的TermVarBinding和一个表现与上述pApplication相同问题的let解析器.我的解析器是在github上,以防有关非尾部的分析错误-递归问题.

解决方案

FParsec的内置解析器组合器通常不允许尾部递归解析器实现,主要是因为错误处理的实现方式./p>

这意味着FParsec解析器的递归深度受堆栈大小的限制-大多数递归下降解析器就是这种情况.当然,这不会影响使用manysepBychainl等进行的序列解析,因为这些FParsec组合器都具有恒定栈空间使用的实现.

.NET中的默认堆栈大小通常足以使用FParsec解析人为生成的输入(采用指定格式)(假设您使用内置组合器解析序列).但是,具有深层嵌套结构(例如5000级深s表达式)的机器生成的输入很容易导致堆栈溢出.

如果您事先知道输入中的嵌套深度被限制为合理的值,则可以简单地-级别 FParsec的 API ,您显然必须the article written by Luca Bolognese as the example:

<expression> ::= <name> | <function> | <application>  
<name> ::= non­blank character sequence  
<function> ::= \ <name> . <body>  
<body> ::= <expression>  
<application> ::= ( <function expression> <argument expression> )  
<function expression> ::= <expression>  
<argument expression> ::= <expression>

The parser in the article is quite concise:

let ws = " \t\n" 
let specialChars = ".)(\\\n" 

let pWs = spaces 
let pName = manyChars (noneOf (ws + specialChars)) |>> EName 

let pExpr, pExprRef = createParserForwardedToRef<Expression, Unit>() 

let curry2 f a b = f(a,b) 
let pFunction = pchar '\\' >>. pipe2 pName (pchar '.' >>. pExpr) (curry2 Function) 

let pApplication = pchar '(' >>. pipe2 pExpr (pWs >>. pExpr) (curry2 Application)
                            .>> pWs .>> pchar ')'

do pExprRef := pFunction <|> pApplication <|> pName 

let pExpressions = sepBy pExpr spaces1 

let fparseString text = 
    match run pExpressions text with 
    | Success(result, _, _)   -> result 
    | Failure(errorMsg, _, _) -> failwith (sprintf "Failure: %s" errorMsg) 

I'm interested in pApplication since they consist of two pExprs which in turn could be pApplications too. The parser runs out of stack space in the following benchmark:

let generateString level =
    let rec loop i =
        seq {
                if i < level then
                    yield "("
                    yield! loop level
                    yield " "
                    yield! loop (i+1)
                    yield ")"
                else 
                    yield "(x x)"
        }
    loop 0 |> String.concat ""

let N = 5000
let s = generateString N;; 
let _ = fparseString s;;

How can I rewrite the parser to be tail-recursive?

I recognized the problem when trying to write a parser for a Lisp-like language and test it with real benchmarks. I have Term and VarBinding which are mutually recursive types and a let parser which exhibits the same issue as pApplication above. My parser is on github in case the analysis is wrong regarding the not tail-recursive problem.

解决方案

The built-in parser combinators of FParsec generally don't allow for a tail-recursive parser implementation, mainly because of the way the error handling is implemented.

This means that the recursion depth of an FParsec parser is limited by the stack size -- as is the case for most recursive descent parsers. Of course, this doesn't affect the parsing of sequences with many, sepBy, chainl etc., because these FParsec combinators all have implementations with constant stack space usage.

The default stack size in .NET is usually more than enough for parsing human-generated input in well-specified formats with FParsec (assuming you parse sequences with the built-in combinators). However, machine-generated input with a deeply nested structure (like your 5000 level deep s-expressions) can easily lead to a stack overflow.

If you know in advance that the nesting depth in your input is limited to a reasonable value, you could simply increase the stack size to an appropriate value. Hopefully this is true for your benchmark data.

Otherwise you may have to implement a special purpose parser function for the recursive element(s) of your grammar. You would need to implement this function using the low-level API of FParsec and you would obviously have to implement this function such that it uses the heap instead of the stack for keeping track of the nesting context and intermediate parsing results.

这篇关于FParsec中的尾递归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆