交互式蚂蚁 [英] Interactive Antlr
问题描述
我正在尝试使用 antlr 编写一种简单的交互式(使用 System.in 作为源)语言,但我遇到了一些问题.我在网上找到的例子都使用了每行循环,例如:
I'm trying to write a simple interactive (using System.in as source) language using antlr, and I have a few problems with it. The examples I've found on the web are all using a per line cycle, e.g.:
while(readline)
result = parse(line)
doStuff(result)
但是,如果我正在编写类似 pascal/smtp/etc 的内容,并且第一行"看起来像 X 要求怎么办?我知道它可以在 doStuff 中检查,但我认为逻辑上它是语法的一部分.
But what if I'm writing something like pascal/smtp/etc, with a "first line" looks like X requirment? I know it can be checked in doStuff, but I think logically it is part of the syntax.
或者如果一个命令被分成多行怎么办?我可以试试
Or what if a command is split into multiple lines? I can try
while(readline)
lines.add(line)
try
result = parse(lines)
lines = []
doStuff(result)
catch
nop
但是这样我也隐藏了真正的错误.
But with this I'm also hiding real errors.
或者我每次都可以重新解析所有行,但是:
Or I could reparse all lines everytime, but:
- 会很慢
- 有些指令我不想运行两次
这可以用 ANTLR 来完成,或者如果不能,用别的东西吗?
Can this be done with ANTLR, or if not, with something else?
推荐答案
Dutow 写道:
或者我每次都可以重新解析所有行,但是:
Or I could reparse all lines everytime, but:
会很慢有一些指令我不想运行两次这可以用 ANTLR 来完成,或者如果不能,用别的东西来完成?
it will be slow there are instructions I don't want to run twice Can this be done with ANTLR, or if not, with something else?
是的,ANTLR 可以做到这一点.也许不是开箱即用的,但是通过一些自定义代码,它肯定是可能的.您也不需要为它重新解析整个令牌流.
Yes, ANTLR can do this. Perhaps not out of the box, but with a bit of custom code, it sure is possible. You also don't need to re-parse the entire token stream for it.
假设您想逐行解析一个非常简单的语言,其中每一行要么是一个 program
声明,要么是一个 uses
声明,或者是一个 声明
.
Let's say you want to parse a very simple language line by line that where each line is either a program
declaration, or a uses
declaration, or a statement
.
它应该始终以program
声明开头,然后是零个或多个uses
声明,然后是零个或多个statement
.uses
声明不能跟在 statement
s 之后,并且不能有多个 program
声明.
It should always start with a program
declaration, followed by zero or more uses
declarations followed by zero or more statement
s. uses
declarations cannot come after statement
s and there can't be more than one program
declaration.
为简单起见,statement
只是一个简单的赋值:a = 4
或 b = a
.
For simplicity, a statement
is just a simple assignment: a = 4
or b = a
.
这种语言的 ANTLR 语法可能如下所示:
An ANTLR grammar for such a language could look like this:
grammar REPL;
parse
: programDeclaration EOF
| usesDeclaration EOF
| statement EOF
;
programDeclaration
: PROGRAM ID
;
usesDeclaration
: USES idList
;
statement
: ID '=' (INT | ID)
;
idList
: ID (',' ID)*
;
PROGRAM : 'program';
USES : 'uses';
ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
INT : '0'..'9'+;
SPACE : (' ' | '\t' | '\r' | '\n') {skip();};
但是,我们当然需要添加一些检查.此外,默认情况下,解析器在其构造函数中采用令牌流,但由于我们计划在解析器中逐行传输令牌,因此我们需要在解析器中创建一个新的构造函数.您可以通过将自定义成员放在 @parser::members { ... }
或 @lexer::members { ... }
中来在词法分析器或解析器类中添加自定义成员> 部分.我们还将添加几个布尔标志来跟踪 program
声明是否已经发生,以及是否允许 uses
声明.最后,我们将添加一个 process(String source)
方法,该方法为每个新行创建一个词法分析器,该词法分析器被提供给解析器.
But, we'll need to add a couple of checks of course. Also, by default, a parser takes a token stream in its constructor, but since we're planning to trickle tokens in the parser line-by-line, we'll need to create a new constructor in our parser. You can add custom members in your lexer or parser classes by putting them in a @parser::members { ... }
or @lexer::members { ... }
section respectively. We'll also add a couple of boolean flags to keep track whether the program
declaration has happened already and if uses
declarations are allowed. Finally, we'll add a process(String source)
method which, for each new line, creates a lexer which gets fed to the parser.
所有这些看起来像:
@parser::members {
boolean programDeclDone;
boolean usesDeclAllowed;
public REPLParser() {
super(null);
programDeclDone = false;
usesDeclAllowed = true;
}
public void process(String source) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(source);
REPLLexer lexer = new REPLLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
super.setTokenStream(tokens);
this.parse(); // the entry point of our parser
}
}
现在在我们的语法中,我们将检查几个门控语义谓词 如果我们以正确的顺序解析声明.在解析某个声明或语句之后,我们将希望翻转某些布尔标志以允许或禁止声明.这些布尔标志的翻转是通过每个规则的 @after { ... }
部分完成的,该部分被执行(毫不奇怪) 来自该解析器规则的标记匹配.
Now inside our grammar, we're going to check through a couple of gated semantic predicates if we're parsing declarations in the correct order. And after parsing a certain declaration, or statement, we'll want to flip certain boolean flags to allow- or disallow declaration from then on. The flipping of these boolean flags is done through each rule's @after { ... }
section that gets executed (not surprisingly) after the tokens from that parser rule are matched.
你的最终语法文件现在看起来像这样(包括一些用于调试目的的System.out.println
):
Your final grammar file now looks like this (including some System.out.println
's for debugging purposes):
grammar REPL;
@parser::members {
boolean programDeclDone;
boolean usesDeclAllowed;
public REPLParser() {
super(null);
programDeclDone = false;
usesDeclAllowed = true;
}
public void process(String source) throws Exception {
ANTLRStringStream in = new ANTLRStringStream(source);
REPLLexer lexer = new REPLLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
super.setTokenStream(tokens);
this.parse();
}
}
parse
: programDeclaration EOF
| {programDeclDone}? (usesDeclaration | statement) EOF
;
programDeclaration
@after{
programDeclDone = true;
}
: {!programDeclDone}? PROGRAM ID {System.out.println("\t\t\t program <- " + $ID.text);}
;
usesDeclaration
: {usesDeclAllowed}? USES idList {System.out.println("\t\t\t uses <- " + $idList.text);}
;
statement
@after{
usesDeclAllowed = false;
}
: left=ID '=' right=(INT | ID) {System.out.println("\t\t\t " + $left.text + " <- " + $right.text);}
;
idList
: ID (',' ID)*
;
PROGRAM : 'program';
USES : 'uses';
ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
INT : '0'..'9'+;
SPACE : (' ' | '\t' | '\r' | '\n') {skip();};
可以用以下类进行测试:
which can be tested wit the following class:
import org.antlr.runtime.*;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws Exception {
Scanner keyboard = new Scanner(System.in);
REPLParser parser = new REPLParser();
while(true) {
System.out.print("\n> ");
String input = keyboard.nextLine();
if(input.equals("quit")) {
break;
}
parser.process(input);
}
System.out.println("\nBye!");
}
}
要运行此测试类,请执行以下操作:
To run this test class, do the following:
# generate a lexer and parser:
java -cp antlr-3.2.jar org.antlr.Tool REPL.g
# compile all .java source files:
javac -cp antlr-3.2.jar *.java
# run the main class on Windows:
java -cp .;antlr-3.2.jar Main
# or on Linux/Mac:
java -cp .:antlr-3.2.jar Main
<小时>
如您所见,您只能声明一次程序
:
> program A
program <- A
> program B
line 1:0 rule programDeclaration failed predicate: {!programDeclDone}?
<小时>
uses
不能跟在 statement
s 之后:
uses
cannot come after statement
s:
> program X
program <- X
> uses a,b,c
uses <- a,b,c
> a = 666
a <- 666
> uses d,e
line 1:0 rule usesDeclaration failed predicate: {usesDeclAllowed}?
<小时>
并且您必须以 program
声明开头:
> uses foo
line 1:0 rule parse failed predicate: {programDeclDone}?
这篇关于交互式蚂蚁的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!