ANTLR分析器具有手动词法分析器 [英] ANTLR Parser with manual lexer
问题描述
我移植一个C#为基础的从手动词法分析器/解析器ANTLR的编程语言编译器。
I'm migrating a C#-based programming language compiler from a manual lexer/parser to Antlr.
ANTLR的已经给我头疼的厉害,因为它通常是大多的作品,但后来有一些不和是令人难以置信的痛苦,解决了一小部分。
Antlr has been giving me severe headaches because it usually mostly works, but then there are the small parts that do not and are incredibly painful to solve.
我发现我最头疼的问题是由引起的的Antlr的词法份,而不是解析器。然后我注意到解析器语法X;
,并意识到,也许我可以有我的手写词法分析器,然后一个ANTLR的生成解析器
I discovered that most of my headaches are caused by the lexer parts of Antlr, rather than the parser. Then I noticed parser grammar X;
and realized that perhaps I could have my manually written lexer and then an Antlr generated parser.
所以我在寻找关于这个主题的更多资料。我想自定义ITokenStream可以工作,但似乎对这个话题......
So I'm looking for more documentation on this topic. I guess a custom ITokenStream could work, but there appears to be virtually no online documentation on this topic...
推荐答案
我发现几乎没有在线文档如何。它可能不是最好的方法,但它肯定似乎是工作。
I found out how. It might not be the best approach but it certainly seems to be working.
- ANTLR语法分析器收到
ITokenStream
参数 - ANTLR的词法分析器本身
ITokenSource
取值 -
ITokenSource
是一个比ITokenStream
- 要转换的最简单方法显著简单的界面
ITokenSource
到ITokenStream
是使用CommonSourceStream
,其接收ITokenSource
参数
- Antlr parsers receive a
ITokenStream
parameter - Antlr lexers are themselves
ITokenSource
s ITokenSource
is a significantly simpler interface thanITokenStream
- The simplest way to convert a
ITokenSource
to aITokenStream
is to use aCommonSourceStream
, which receives aITokenSource
parameter
所以,现在我们只需要做两件事情:
So now we only need to do 2 things:
- 调整文法是解析器仅
- 实施ITokenSource
调整语法很简单。只需删除所有词法分析器声明和保证您声明语法为语法分析器
。一个简单的例子是张贴在这里的舒适:
Adjusting the grammar is very simple. Simply remove all lexer declarations and ensure you declare the grammar as parser grammar
. A simple example is posted here for convinience:
parser grammar mygrammar;
options
{
language=CSharp2;
}
@parser::namespace { MyNamespace }
document: (WORD {Console.WriteLine($WORD.text);} |
NUMBER {Console.WriteLine($NUMBER.text);})*;
请注意,下列文件将输出类mygrammar
而不是类mygrammarParser
。
Note that the following file will output class mygrammar
instead of class mygrammarParser
.
所以,现在我们要实现一个假的词法分析器。
我个人用下面的伪代码:
So now we want to implement a "fake" lexer. I personally used the following pseudo-code:
TokenQueue q = new TokenQueue();
//Do normal lexer stuff and output to q
CommonTokenStream cts = new CommonTokenStream(q);
mygrammar g = new mygrammar(cts);
g.document();
最后,我们需要定义 TokenQueue
。 TokenQueue
不是绝对必要的,但我用了方便。
它应该有方法来接收词法分析器令牌和方法,以输出ANTLR的令牌。因此,如果不使用ANTLR的本地令牌一个人来实现转换到ANTLR的令牌的方法。
此外, TokenQueue
必须实施 ITokenSource
。
Finally, we need to define TokenQueue
. TokenQueue
is not strictly necessary but I used it for convenience.
It should have methods to receive the lexer tokens, and methods to output Antlr tokens. So if not using Antlr native tokens one has to implement a convert-to-Antlr-token method.
Also, TokenQueue
must implement ITokenSource
.
请注意,它正确设置标记变量是非常重要的。最初,我有一些问题,因为我当时错估 CharPositionInLine
。如果这些变量设置不正确,则该分析器可能会失败。
此外,正常的渠道(不是隐藏)为0。
Be aware that it is very important to correctly set the token variables. Initially, I had some problems because I was miscalculating CharPositionInLine
. If these variables are incorrectly set, then the parser may fail.
Also, the normal channel(not hidden) is 0.
这似乎是为我工作至今。我希望其他人发现它有用。
我提供反馈意见。特别是,如果你找到一个更好的办法来解决这个问题,随意张贴一个单独的答复。
This seems to be working for me so far. I hope others find it useful as well. I'm open to feedback. In particular, if you find a better way to solve this problem, feel free to post a separate reply.
这篇关于ANTLR分析器具有手动词法分析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!