LALR(1)语法中的错误恢复 [英] Error recovery in an LALR(1) grammar

查看:65
本文介绍了LALR(1)语法中的错误恢复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一些解析器和词法分析器生成工具(类似于Lex和Bison,但对于C#)来生成程序,该程序将字符串解析为抽象语法树,以便稍后进行评估.

我想进行错误恢复(即在生成的抽象句子树中报告缺少标记等).我有两种方法来构造生成的语法,我想知道哪种方法更好/更灵活/不会有冲突(.y和.lex文件是根据描述生成的计算器).

计算器描述允许用户为操作员和关联性指定终端/正则表达式及其位置.像这样:

grammar.AddTerminal("Plus", "\\+").
    AddNonTerminal(new NonTerminal("Add", Associativity.LeftToRight).
        AddTerminal("Expression").
        AddTerminal("Plus").
        AddTerminal("Expression"));

(优先级是通过添加TerminalNonTerminal的顺序指定的."Add"是通过反射发现的方法的名称.基本上,它告诉NonTerminal调用什么方法.操作符在抽象语法树中.)


方法1 :(允许对任何表达式使用空规则)

S -> E
E -> E + T
E -> T
T -> T * P
T -> P
P -> (E)
P -> (E [error]
P -> a
P -> @ [error]


a是终端. @为空.


方法2 :(仅允许将空规则用作开始规则)

S -> E
S -> @ [error]
E -> + [error]
E -> T + [error]
E -> + T [error]
E -> E + T
E -> T
T -> * [error]
T -> * P [error]
T -> P * [error]
T -> T * P
T -> P
P -> (E)
P -> (E [error]
P -> a

下面是一个示例,显示使用每种方法对错误输入的最左推导.


输入:(a +


方法1 :

S
E
T
P
(E
(E + T
(T + T
(P + T
(a + T
(a + P
(a +


方法2 :

S
E
T
P
(E
(T +
(P +
(a +


方法2很难编码(考虑减法/一元负运算符.您不能只看减法A -> A - B,先取出第一个A并报告A -> - B的错误,因为这对于我今天早上为方法2编码,只是发现我认为它存在语法问题,并且方法1中的空规则使事情变得更简单,更代码化,但是我主要关心的是这种方法将产生最少的语法问题,因为程序员如上所述创建了计算器描述.

解决方案

这个问题已经存在了一段时间,涵盖了该主题的初学者经常访问的一个主题.人们常常发现,那些已经完成了本科学历的编译器课程的人知道,这是那些没有简单答案或单一答案的问题之一.您可能已经注意到,您有关于同一主题的两个问题,但均未得到回答. 别人发布的另一个问题得到了回答指向解释为什么如此困难的文献的指针.

这个问题已经存在了50多年了.如果人们随着时间的流逝,从早期的会议论文,课程教科书,博士学位论文和(今天)SO中审查文献,我们可以看到经常引用这是一个错误问题的事实! (或者说是解决问题的方法错误).

只需从多年来的课程文本中取样(从我的书架中随机选择):

Gries,D.(1970)错误恢复与纠正-文献介绍​​,由Bauer,F.L.编辑的《 编译器构造,高级课程》 . & Eickel,J.,Springer Verlag,第627-638页.
Gries,D.(1971)数字计算机的编译器构造,Wiley,第320-326页.
Aho,A.V.,Ullman,J.D.(1977)编译器设计原理,Addison Wesley,第397-405页.
Bornat,R.(1979)理解和编写编译器,Macmillan,第251-252页.
Hanson,D.(1995)可重定位的 C 编译器:设计与实现,Addison-Wesley,第140-146页. Grune,D.,Bal,H.E.,Jacobs,C.J.H. &兰根多恩(K.G.) (2000) Modern Compiler Design ,Wiley,第175-184页.
Aho,A.V.,Lam,M.S.,Sethi,R.,Ullman,J.D.(2007) Compilers:Principles,Techniques,&工具,皮尔逊(Pearson),艾迪生·韦斯利(Addison-Wesley),第283-296页.

所有这些都同意(超过40年),您的问题是关于错误地使用错误的工具或方向错误的.我想我想说的是你不能从这里到那里".您应该从其他地方开始.

如果您想要更深入的研究,请阅读完整的博士学位论文:

查尔斯(1991)一种构造有效LALR的实用方法)纽约大学具有自动错误恢复功能的解析器

希望对以后再次访问此问题的人来说,有一个占位符可以回答这些问题.


我从评论中注意到您正在使用从CPPG派生的MPPG.并非所有人都会使用这些工具,因此我提供了这些工具的几个链接:

托管的Babel系统必备内容
花园点解析器生成器
Irony .NET编译器构建套件
编写第一个Visual Studio语言服务

I'm using some parser and lexer generating tools (similar to Lex and Bison, but for C#) to generate programs that parse strings into abstract syntax trees that can later be evaluated.

I wanted to do error recovery (i.e. report in the produced abstract sentence tree that there are missing tokens and such). I had two approaches in mind to structuring the generated grammars, and I was wondering which approach was better/more flexible/wouldn't have conflicts (the .y and .lex files are generated based on a description of the calculator).

The calculator description allows the user to specify terminals/regex's and their placement for operators and the associativity. So something like:

grammar.AddTerminal("Plus", "\\+").
    AddNonTerminal(new NonTerminal("Add", Associativity.LeftToRight).
        AddTerminal("Expression").
        AddTerminal("Plus").
        AddTerminal("Expression"));

(Precedence is specified via the order that the Terminal's and NonTerminal's are added. "Add" is the name of a method that is discovered via Reflection. Basically it tells the NonTerminal what to call the operator in the abstract syntax tree.)


Approach 1: (allow the empty rule for any expression)

S -> E
E -> E + T
E -> T
T -> T * P
T -> P
P -> (E)
P -> (E [error]
P -> a
P -> @ [error]


a is a terminal. @ is empty.


Approach 2: (only allow the empty rule for the start rule)

S -> E
S -> @ [error]
E -> + [error]
E -> T + [error]
E -> + T [error]
E -> E + T
E -> T
T -> * [error]
T -> * P [error]
T -> P * [error]
T -> T * P
T -> P
P -> (E)
P -> (E [error]
P -> a

Here's an example to show a left-most derivation for a bad-input using each approach.


Input: (a +


Approach 1:

S
E
T
P
(E
(E + T
(T + T
(P + T
(a + T
(a + P
(a +


Approach 2:

S
E
T
P
(E
(T +
(P +
(a +


Approach 2 is much harder to code for (consider subtraction/unary negative operator. You can't just look at subtract A -> A - B, take out that first A and report an error on A -> - B because that's valid for the unary operator.) I coded for approach 2 this morning only to find out that I think it has grammar issues and that an empty rule as in Approach 1 makes things much simpler, code-wise, but my main concern is which approach would produce the least amount of grammar issues as programmers create calculator descriptions as described above.

解决方案

This question has been around for a while and covers a topic that is is often visited by beginners in the subject. One often find that those who have done a compilers course in their undergraduate degree know that this is one of those questions that has no easy or single answer. You might have noticed that you have two questions on the same topic, neither of which has been answered. Another question someone else posted was answered with pointers to the literature that explains why this is hard.

It is a question that has remained extant for over 50 years. If one examines the literature over time, from early conference papers, course textbooks, doctoral thesis and (today) SO, we can see regular references to the fact that this is the wrong question! (Or rather the wrong approach to the problem).

Just taking a sample from course texts over the years (random selections from my bookshelf):

Gries, D. (1970) Error Recovery and Correction - An introduction to the Literature, In Compiler Construction, An advanced Course edited by Bauer, F.L. & Eickel, J., Springer Verlag, pp.627-638.
Gries, D. (1971) Compiler Construction for Digital Computers, Wiley, pp.320-326.
Aho, A.V., Ullman, J.D. (1977) Principles of Compiler Design, Addison Wesley, pp.397-405.
Bornat, R. (1979) Understanding and Writing Compilers, Macmillan, pp.251-252.
Hanson, D. (1995) A retargetable C compiler: Design and Implementation,Addison-Wesley, pp.140-146.
Grune, D., Bal, H.E., Jacobs, C.J.H. & Langendoen, K.G. (2000) Modern Compiler Design, Wiley, pp.175-184.
Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D. (2007) Compilers: Principles, Techniques, & Tools, Pearson, Addison-Wesley, pp.283-296.

All of these concur (over a 40 year span) that your question is about using the wrong tools wrongly or going in the wrong direction. I think I'm trying to say "You can't there from here". You should start from somewhere else.

If you want something deeper, there is a whole Phd thesis:

Charles, P. (1991) A Practical method for Constructing Efficient LALR(k) Parsers with Automatic Error Recovery, New York University

Hopefully, for those who visit this question again in the future, there is a placeholder for the answers.


I note from comments that you are using MPPG, derived from CPPG. Not everyone will have used these, so I'm including a couple of links to these tools:

Managed Babel Systems Essentials
Garden Points Parser Generator
Irony .NET compiler Construction Kit
Writing your first Visual Studio Language Service

这篇关于LALR(1)语法中的错误恢复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆