ANTLR分析器具有手动词法分析器 [英] ANTLR Parser with manual lexer

查看:302
本文介绍了ANTLR分析器具有手动词法分析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我移植一个C#为基础的从手动词法分析器/解析器ANTLR的编程语言编译器。

I'm migrating a C#-based programming language compiler from a manual lexer/parser to Antlr.

ANTLR的已经给我头疼的厉害,因为它通常是大多的作品,但后来有一些不和是令人难以置信的痛苦,解决了一小部分。

Antlr has been giving me severe headaches because it usually mostly works, but then there are the small parts that do not and are incredibly painful to solve.

我发现我最头疼的问题是由引起的的Antlr的词法份,而不是解析器。然后我注意到解析器语法X; ,并意识到,也许我可以有我的手写词法分析器,然后一个ANTLR的生成解析器

I discovered that most of my headaches are caused by the lexer parts of Antlr, rather than the parser. Then I noticed parser grammar X; and realized that perhaps I could have my manually written lexer and then an Antlr generated parser.

所以我在寻找关于这个主题的更多资料。我想自定义ITokenStream可以工作,但似乎对这个话题......

So I'm looking for more documentation on this topic. I guess a custom ITokenStream could work, but there appears to be virtually no online documentation on this topic...

推荐答案

我发现几乎没有在线文档如何。它可能不是最好的方法,但它肯定似乎是工作。

I found out how. It might not be the best approach but it certainly seems to be working.


  1. ANTLR语法分析器收到 ITokenStream 参数

  2. ANTLR的词法分析器本身 ITokenSource 取值

  3. ITokenSource 是一个比 ITokenStream

  4. 要转换的最简单方法显著简单的​​界面 ITokenSource ITokenStream 是使用 CommonSourceStream ,其接收 ITokenSource 参数

  1. Antlr parsers receive a ITokenStream parameter
  2. Antlr lexers are themselves ITokenSources
  3. ITokenSource is a significantly simpler interface than ITokenStream
  4. The simplest way to convert a ITokenSource to a ITokenStream is to use a CommonSourceStream, which receives a ITokenSource parameter

所以,现在我们只需要做两件事情:

So now we only need to do 2 things:


  1. 调整文法是解析器仅

  2. 实施ITokenSource

调整语法很简单。只需删除所有词法分析器声明和保证您声明语法为语法分析器。一个简单的例子是张贴在这里的舒适:

Adjusting the grammar is very simple. Simply remove all lexer declarations and ensure you declare the grammar as parser grammar. A simple example is posted here for convinience:

parser grammar mygrammar;

options
{
    language=CSharp2;
}

@parser::namespace { MyNamespace }

document:   (WORD {Console.WriteLine($WORD.text);} |
        NUMBER {Console.WriteLine($NUMBER.text);})*;

请注意,下列文件将输出类mygrammar 而不是类mygrammarParser

Note that the following file will output class mygrammar instead of class mygrammarParser.

所以,现在我们要实现一个假的词法分析器。
我个人用下面的伪代码:

So now we want to implement a "fake" lexer. I personally used the following pseudo-code:

TokenQueue q = new TokenQueue();
//Do normal lexer stuff and output to q
CommonTokenStream cts = new CommonTokenStream(q);
mygrammar g = new mygrammar(cts);
g.document();



最后,我们需要定义 TokenQueue TokenQueue 不是绝对必要的,但我用了方便。
它应该有方法来接收词法分析器令牌和方法,以输出ANTLR的令牌。因此,如果不使用ANTLR的本地令牌一个人来实现转换到ANTLR的令牌的方法。
此外, TokenQueue 必须实施 ITokenSource

Finally, we need to define TokenQueue. TokenQueue is not strictly necessary but I used it for convenience. It should have methods to receive the lexer tokens, and methods to output Antlr tokens. So if not using Antlr native tokens one has to implement a convert-to-Antlr-token method. Also, TokenQueue must implement ITokenSource.

请注意,它正确设置标记变量是非常重要的。最初,我有一些问题,因为我当时错估 CharPositionInLine 。如果这些变量设置不正确,则该分析器可能会失败。
此外,正常的渠道(不是隐藏)为0。

Be aware that it is very important to correctly set the token variables. Initially, I had some problems because I was miscalculating CharPositionInLine. If these variables are incorrectly set, then the parser may fail. Also, the normal channel(not hidden) is 0.

这似乎是为我工作至今。我希望其他人发现它有用。
我提供反馈意见。特别是,如果你找到一个更好的办法来解决这个问题,随意张贴一个单独的答复。

This seems to be working for me so far. I hope others find it useful as well. I'm open to feedback. In particular, if you find a better way to solve this problem, feel free to post a separate reply.

这篇关于ANTLR分析器具有手动词法分析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆