如何使用 ANTLR4 解析嵌套的源文件? [英] How can I parse nested source files with ANTLR4?

查看:39
本文介绍了如何使用 ANTLR4 解析嵌套的源文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以前问过这个问题(略有不同),但当时没有足够理解答案,无法给出明智的反馈(叹气).

I've asked this question before (slightly differently) but didn't understand the answers enough at the time to give intelligent feedback (sigh).

我需要能够在任意点将文件包含在其他文件中,因此我需要能够使用单个解析树拥有一堆文件.

I need to be able to include files inside other files at arbitrary points so I need to be able to have a stack of files with a single parse tree.

如果我自己写这个(而且我过去已经这样做了),我的解析器会识别Include xyz"或Import abc",并且会导致词法分析器暂停从当前文件读取,推动它文件在堆栈上,并继续从新文件中读取字符,直到用完为止.

If I was writing this myself (and I have done this in the past), my parser would recognize the "Include xyz" or "Import abc", and would cause the lexer to suspend reading from the current file, push that file on a stack, and continue reading characters from the new file until exhausted.

但是,当使用 ANTLR4(到目前为止,我已经避免在语法文件本身中插入任何代码)并使用访问者模式时,我看到的只是创建的树,这当然也是迟到了.

However, when using ANTLR4 (where so far I've avoided inserting any code into the grammar file itself) and using the visitor pattern, all I see is the created tree which of course is too late.

我发现对 PUSHSTREAM 的引用是可以在词法分析器中完成的,但我找不到实际示例,并且非常感谢您的帮助(或者是指向我在搜索时可能错过的实际示例的指针或短代码如果有人有的话,请举例).

I've found references to PUSHSTREAM as something that can be done in the lexer but I cannot find an actual example and would really appreciate some help (either a pointer to an actual example that I perhaps missed when searching or a short code sample if someone has one).

请注意,我是用 C++ 编写代码,而不是 Java.

Note that I'm writing code in C++, not Java.

提前致谢

推荐答案

几年前我为 ANTLR 2.7 开发了一个解决方案,解析 Windows 资源文件 (*.rc).此类文件的结构与 C/C++ 头文件非常相似,并支持 #if/#end/#pragma/#include 等预处理器指令.

Years ago I developed a solution for ANTLR 2.7, to parse Windows resource files (*.rc). Such files are structured very much like C/C++ header files and support preprocessor directives like #if/#end/#pragma/#include.

为此,我创建了一个特殊字符输入流(带有嵌套的字符输入流),它为包含文件实现了基于堆栈的方法.每当在 char 输入中发现新的包含指令时,都会使用当前实际输入流、其位置和行/列信息创建一个新的堆栈条目(以提供本地源位置,以防发现解析问题).该条目被推入堆栈并创建一个新的输入流.一旦用完,TOS 就会从堆栈中弹出,并从最后一个位置(在 #include 语句之后)继续提供字符.词法分析器只能看到连续的字符流.

For that I created a special character input stream (with a nested char input stream) which implements a stack based approach for include files. Whenever a new include directive is found in the char input a new stack entry is created with the current actual input stream, its position and line/column information (to provide local source locations, in case a parsing problem was found). That entry is pushed onto a stack and a new input stream is created. Once this is exhausted the TOS is popped off the stack and serving chars continued from the last position (after the #include statement). The lexer only sees a continuous stream of characters.

这篇关于如何使用 ANTLR4 解析嵌套的源文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆