antlr 4:所有这些令牌都应该显示在AST中吗? [英] antlr 4: Should all of these tokens be showing up in the AST?

查看:89
本文介绍了antlr 4:所有这些令牌都应该显示在AST中吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的最终目标是将结构化文件解析为内存中对象的树,然后可以对其进行操作.我使用的文件格式相当复杂,大约有200个关键字/标签,这似乎是学习解析器/词法分析器框架的一个很好的理由.

My ultimate goal is to parse a structured file as a tree of in-memory objects that I can then manipulate. The file format that I'm using is fairly sophisticated with about 200 keywords/tags, and this seemed like a good reason to learn about parser/lexer frameworks.

不幸的是,有太多的概念(以及成百上千的教程和指南),到目前为止,学习过程感觉就像是尝试从消防水带喝水.因此,我采取了一些非常微不足道的步骤,从此示例.

Unfortunately, there are so many concepts (and hundreds of tutorials and guides) that the learning process so far feels like trying to drink from a fire hose. So I'm taking some very meager baby steps, starting with this example.

我修改了语法以创建以下测试Nano.g4:

I modified the grammar to create the following test, Nano.g4:

grammar Nano;

r  : root ;
root : START ROOT ID END ROOT;
START : 'StartBlock' ;
END : 'EndBlock' ;
ROOT : 'RootItem' ;
ID : [a-z]+ ;             // match lower-case identifiers
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

接下来,我创建了一个简单的输入文件nano.txt:

Next, I created a simple input file, nano.txt:

StartBlock RootItem
   foo
EndBlock RootItem

然后我使用以下命令加载代码:

I then loaded the code using the following commands:

del *.class
del *.java
java org.antlr.v4.Tool Nano.g4
javac nano*.java
java org.antlr.v4.runtime.misc.TestRig Nano r -gui < nano.txt

这给了我这个结果:

上面的树是我对词法分析器和解析器的期望的第一个概念性的宿醉.为了使输入文件合法,"StartBlock RootItem"和"EndBlock RootItem"标记是必需的,但是从概念上讲,在证明文件格式正确后,我不需要它们.从那时起,我唯一关心的是存在一个包含"foo"的RootItem,如下所示:

The tree above is my first conceptual hangup about what to expect from a lexer and parser. The "StartBlock RootItem" and "EndBlock RootItem" tokens are necessary in terms of making the input file legal, but conceptually I don't need them after I've proven that the file is properly formatted. The only thing that I care about from that point on is that there's a RootItem that contains "foo", as shown here:

再次,我是解析器/词法分析器概念的新手. 应该我(或者甚至有可能)编写语法,以便输出树与上面的图像匹配吗?还是应该在后续遍历AST并仅提取相关数据字段的后续步骤中解决这个问题?

Again, I'm painfully new to parser/lexer concepts. Should I (or, is it even possible to) write the grammar so the output tree matches the image above? Or should I take care of that in some subsequent step that traverses the AST and only extracts the relevant data fields?

推荐答案

ANTLR 4生成解析树,而不是AST.这是与ANTLR 3的行为的重要区别,它被选择来帮助长期维护语法.特别是,可能会出现以下情况:用户要做想要访问令牌,例如作为IDE中语义突出显示组件的一部分.在这种情况下,我们不是强迫用户编写针对特定应用的修改语法,而是选择始终将所有标记都包括在语法分析树中.

ANTLR 4 produces parse trees, not ASTs. This is an important distinction from the behavior of ANTLR 3, and was chosen to help with long-term maintenance of grammars. In particular, situations may arise where users do want access to the tokens, e.g. as part of a semantic highlighting component in an IDE. Rather than force users to write application-specific modified grammars in such a scenario, we chose to always include all tokens in the parse tree.

这篇关于antlr 4:所有这些令牌都应该显示在AST中吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆