Antlr 解析器 StackOverflowException [英] Antlr parser StackOverflowException

查看:23
本文介绍了Antlr 解析器 StackOverflowException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 Xamarin.ios C# 项目的 Antlr 解析器语法中有以下内容:

I have the following in my Antlr parser grammar for my Xamarin.ios C# project:

mathToken  
    : DIGIT             #Digit
    | NULL              #Null
    | LESSTHAN          #LessThan
    | GREATERTHAN       #GreaterThan
    | anyLessThanOrEqual   #LessThanOrEqual
    // about 30 more options here

mathTokenList
    :   mathToken mathTokenList   #CompoundMathTokens
    |   mathToken                 #SingleMathToken
    ;

这对于包含 10 个、100 甚至 1000 个标记的列表非常有效.但是一旦列表变得足够长,就会导致 StackOverflowException,因为生成的 MathTokenList 递归调用自身,并在顶部带有一些侦听器代码:

This works great for a list of 10 tokens, 100, or even 1000. But once the list gets long enough, it leads to a StackOverflowException, as the generated MathTokenList recursively calls itself, with some listener code at the top:

MyNamespace.HandleToken(MyTokenClass parserToken, List<MyOtherTokenClass> buildingList) in MyNamespace.ManualFileParser.cs:58
MyNamespace.CustomStringReaderParseListener.VisitDefault(Antlr4.Runtime.Tree.TerminalNodeImpl node) in MyNamespace.CustomStringReaderParseListener.cs:228
MyNamespace.CustomStringReaderParseListener.VisitTerminal(Antlr4.Runtime.Tree.TerminalNodeImpl node) in MyNamespace.CustomStringReaderParseListener.cs:94
Antlr4.Runtime.Parser.Consume()
Antlr4.Runtime.Parser.Match(int ttype)  
Antlr.StringReaderParser.mathToken() 
Antlr.StringReaderParser.mathTokenList() // lots of calls here . . . seems to be
Antlr.StringReaderParser.mathTokenList() // one for each token.
Antlr.StringReaderParser.mathTokenList() // ( in antlr generated code)

是否可以重构解析器语法来避免此类问题?或者我是否需要做一些更复杂的事情,以便解析器永远不会看到很长的 mathToken 列表?

Is it possible to restructure the parser grammar to avoid this kind of problem? Or do I need to do something more involved so that the parser never sees a really long list of mathTokens?

我可以通过在解析器看到数字之前组合数字列表来解决这个问题.但它最终可能会以其他标记类型再次出现.

I could put a band-aid on the problem by combining lists of digits before the parser sees them. But it would likely recur eventually with some other token type.

推荐答案

您无法完全避免这个问题.每个规则调用实际上是生成解析器中的一个函数调用(递归下降解析器原理).如果您的输入足够复杂,您肯定会达到可用堆栈限制.在大多数(所有?)编译器中,您可以增加应用的堆栈空间,但这也只能在一定程度上有所帮助.

You cannot entirely avoid this problem. Each rule invocation is actually a function call in the generated parser (recursive descent parser principle). If your input is only complex enough you will certainly hit the available stack limit. In most (all?) compilers you can increase the stackspace for your app, but this also helps only up to a certain extent.

但是,正如@BartKiers 建议的那样,您可以通过使用循环而不是递归(编程中经常使用的原则)来降低调用次数.mathTokenList 规则与您在 yacc/bison 中定义列表的方式非常相似.在 ANTLR 中,您可以改用循环运算符,这样可以更好地读取并减少资源占用:

However, as @BartKiers suggested you can lower the invocation count by using loops instead of recursions (a principle often used in programming). The mathTokenList rule looks very much like how you would define a list in yacc/bison. In ANTLR you can use loop operators instead, which make this better to read and less resource intensive:

mathTokenList: mathToken+;

是去这里的方式.这将最终在您的 mathTokenList 方法中执行循环(看看生成的解析器代码,有时很有启发性).

is the way to go here. This will end up in a loop being executed in your mathTokenList method (look at the generated parser code, sometimes quite enlightening).

这篇关于Antlr 解析器 StackOverflowException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆