交互式蚂蚁 [英] Interactive Antlr

查看:49
本文介绍了交互式蚂蚁的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 antlr 编写一种简单的交互式(使用 System.in 作为源)语言,但我遇到了一些问题.我在网上找到的例子都使用了每行循环,例如:

I'm trying to write a simple interactive (using System.in as source) language using antlr, and I have a few problems with it. The examples I've found on the web are all using a per line cycle, e.g.:

while(readline)
  result = parse(line)
  doStuff(result)

但是,如果我正在编写类似 pascal/smtp/etc 的内容,并且第一行"看起来像 X 要求怎么办?我知道它可以在 doStuff 中检查,但我认为逻辑上它是语法的一部分.

But what if I'm writing something like pascal/smtp/etc, with a "first line" looks like X requirment? I know it can be checked in doStuff, but I think logically it is part of the syntax.

或者如果一个命令被分成多行怎么办?我可以试试

Or what if a command is split into multiple lines? I can try

while(readline)
  lines.add(line)
  try
    result = parse(lines)
    lines = []
    doStuff(result)
  catch
    nop

但是这样我也隐藏了真正的错误.

But with this I'm also hiding real errors.

或者我每次都可以重新解析所有行,但是:

Or I could reparse all lines everytime, but:

  1. 会很慢
  2. 有些指令我不想运行两次

这可以用 ANTLR 来完成,或者如果不能,用别的东西吗?

Can this be done with ANTLR, or if not, with something else?

推荐答案

Dutow 写道:

或者我每次都可以重新解析所有行,但是:

Or I could reparse all lines everytime, but:

会很慢有一些指令我不想运行两次这可以用 ANTLR 来完成,或者如果不能,用别的东西来完成?

it will be slow there are instructions I don't want to run twice Can this be done with ANTLR, or if not, with something else?

是的,ANTLR 可以做到这一点.也许不是开箱即用的,但是通过一些自定义代码,它肯定是可能的.您也不需要为它重新解析整个令牌流.

Yes, ANTLR can do this. Perhaps not out of the box, but with a bit of custom code, it sure is possible. You also don't need to re-parse the entire token stream for it.

假设您想逐行解析一个非常简单的语言,其中每一行要么是一个 program 声明,要么是一个 uses 声明,或者是一个 声明.

Let's say you want to parse a very simple language line by line that where each line is either a program declaration, or a uses declaration, or a statement.

它应该始终以program 声明开头,然后是零个或多个uses 声明,然后是零个或多个statement.uses 声明不能跟在 statements 之后,并且不能有多个 program 声明.

It should always start with a program declaration, followed by zero or more uses declarations followed by zero or more statements. uses declarations cannot come after statements and there can't be more than one program declaration.

为简单起见,statement 只是一个简单的赋值:a = 4b = a.

For simplicity, a statement is just a simple assignment: a = 4 or b = a.

这种语言的 ANTLR 语法可能如下所示:

An ANTLR grammar for such a language could look like this:

grammar REPL;

parse
  :  programDeclaration EOF
  |  usesDeclaration EOF
  |  statement EOF
  ;

programDeclaration
  :  PROGRAM ID
  ;

usesDeclaration
  :  USES idList
  ;

statement
  :  ID '=' (INT | ID)
  ;

idList
  :  ID (',' ID)*
  ;

PROGRAM : 'program';
USES    : 'uses';
ID      : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
INT     : '0'..'9'+;
SPACE   : (' ' | '\t' | '\r' | '\n') {skip();};

但是,我们当然需要添加一些检查.此外,默认情况下,解析器在其构造函数中采用令牌流,但由于我们计划在解析器中逐行传输令牌,因此我们需要在解析器中创建一个新的构造函数.您可以通过将自定义成员放在 @parser::members { ... }@lexer::members { ... } 中来在词法分析器或解析器类中添加自定义成员> 部分.我们还将添加几个布尔标志来跟踪 program 声明是否已经发生,以及是否允许 uses 声明.最后,我们将添加一个 process(String source) 方法,该方法为每个新行创建一个词法分析器,该词法分析器被提供给解析器.

But, we'll need to add a couple of checks of course. Also, by default, a parser takes a token stream in its constructor, but since we're planning to trickle tokens in the parser line-by-line, we'll need to create a new constructor in our parser. You can add custom members in your lexer or parser classes by putting them in a @parser::members { ... } or @lexer::members { ... } section respectively. We'll also add a couple of boolean flags to keep track whether the program declaration has happened already and if uses declarations are allowed. Finally, we'll add a process(String source) method which, for each new line, creates a lexer which gets fed to the parser.

所有这些看起来像:

@parser::members {

  boolean programDeclDone;
  boolean usesDeclAllowed;

  public REPLParser() {
    super(null);
    programDeclDone = false;
    usesDeclAllowed = true;
  }

  public void process(String source) throws Exception {
    ANTLRStringStream in = new ANTLRStringStream(source);
    REPLLexer lexer = new REPLLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    super.setTokenStream(tokens);
    this.parse(); // the entry point of our parser
  } 
}

现在在我们的语法中,我们将检查几个门控语义谓词 如果我们以正确的顺序解析声明.在解析某个声明或语句之后,我们将希望翻转某些布尔标志以允许或禁止声明.这些布尔标志的翻转是通过每个规则的 @after { ... } 部分完成的,该部分被执行(毫不奇怪) 来自该解析器规则的标记匹配.

Now inside our grammar, we're going to check through a couple of gated semantic predicates if we're parsing declarations in the correct order. And after parsing a certain declaration, or statement, we'll want to flip certain boolean flags to allow- or disallow declaration from then on. The flipping of these boolean flags is done through each rule's @after { ... } section that gets executed (not surprisingly) after the tokens from that parser rule are matched.

你的最终语法文件现在看起来像这样(包括一些用于调试目的的System.out.println):

Your final grammar file now looks like this (including some System.out.println's for debugging purposes):

grammar REPL;

@parser::members {

  boolean programDeclDone;
  boolean usesDeclAllowed;

  public REPLParser() {
    super(null);
    programDeclDone = false;
    usesDeclAllowed = true;
  }

  public void process(String source) throws Exception {
    ANTLRStringStream in = new ANTLRStringStream(source);
    REPLLexer lexer = new REPLLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    super.setTokenStream(tokens);
    this.parse();
  } 
}

parse
  :  programDeclaration EOF
  |  {programDeclDone}? (usesDeclaration | statement) EOF
  ;

programDeclaration
@after{
  programDeclDone = true;
}
  :  {!programDeclDone}? PROGRAM ID {System.out.println("\t\t\t program <- " + $ID.text);}
  ;

usesDeclaration
  :  {usesDeclAllowed}? USES idList {System.out.println("\t\t\t uses <- " + $idList.text);}
  ;

statement
@after{
  usesDeclAllowed = false; 
}
  :  left=ID '=' right=(INT | ID) {System.out.println("\t\t\t " + $left.text + " <- " + $right.text);}
  ;

idList
  :  ID (',' ID)*
  ;

PROGRAM : 'program';
USES    : 'uses';
ID      : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
INT     : '0'..'9'+;
SPACE   : (' ' | '\t' | '\r' | '\n') {skip();};

可以用以下类进行测试:

which can be tested wit the following class:

import org.antlr.runtime.*;
import java.util.Scanner;

public class Main {
    public static void main(String[] args) throws Exception {
        Scanner keyboard = new Scanner(System.in);
        REPLParser parser = new REPLParser();
        while(true) {
            System.out.print("\n> ");
            String input = keyboard.nextLine();
            if(input.equals("quit")) {
                break;
            }
            parser.process(input);
        }
        System.out.println("\nBye!");
    }
}

要运行此测试类,请执行以下操作:

To run this test class, do the following:

# generate a lexer and parser:
java -cp antlr-3.2.jar org.antlr.Tool REPL.g

# compile all .java source files:
javac -cp antlr-3.2.jar *.java

# run the main class on Windows:
java -cp .;antlr-3.2.jar Main 
# or on Linux/Mac:
java -cp .:antlr-3.2.jar Main

<小时>

如您所见,您只能声明一次程序:

> program A
                         program <- A

> program B
line 1:0 rule programDeclaration failed predicate: {!programDeclDone}?

<小时>

uses 不能跟在 statements 之后:


uses cannot come after statements:

> program X
                         program <- X

> uses a,b,c
                         uses <- a,b,c

> a = 666
                         a <- 666

> uses d,e
line 1:0 rule usesDeclaration failed predicate: {usesDeclAllowed}?

<小时>

并且您必须以 program 声明开头:

> uses foo
line 1:0 rule parse failed predicate: {programDeclDone}?

这篇关于交互式蚂蚁的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆