如何在 ANTLR3 中制作 TreeParser? [英] How do I make a TreeParser in ANTLR3?

查看:20
本文介绍了如何在 ANTLR3 中制作 TreeParser?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了好玩,我正在尝试学习语言解析...

我创建了一个 ANTLR 语法,我相信它与我希望实现的简单语言相匹配.它将具有以下语法:

( <OptionalArguments>+) {<OptionalChildFunctions>+}

实际例子:

ForEach(in:[1,2,3,4,5] as:"nextNumber") {打印(消息:{nextNumber})}

我相信我的语法可以正确地匹配这个结构,现在我正在尝试为该语言构建一个抽象语法树.

首先,我必须承认我不太确定这棵树应该是什么样子.其次,我完全不知道如何在我的 Antlr 语法中做到这一点......我已经尝试了几个小时但没有取得多大成功.

这是我目前对树的想法:

 函数名/\属性         \/\/\ID/\ 子函数/\ 身份证等/\属性属性值类型

这是我当前的 Antlr 语法文件:

语法测试;选项 {output=AST;ASTLabelType=CommonTree;}程序:功能;功能:ID(OPEN_BRACKET(属性(逗号?属性)*)?CLOSE_BRACKET)?(OPEN_BRACE 功能* CLOSE_BRACE)?;属性:ID COLON 数据类型;数据类型:NUMBER |字符串 |布尔值 |数组 |抬头 ;数组:OPEN_BOX(数据类型(逗号数据类型)*)?CLOSE_BOX ;查找:OPEN_BRACE(ID(PERIOD ID)*)CLOSE_BRACE;数字: ('+' | '-')?(整数 | 浮点数);细绳: '"' (ESC_SEQ | ~('\\'|'"'))* '"';布尔值: '真' |'真' |'假' |'错误的';ID : (LETTER|'_') (LETTER | INTEGER |'_')*;评论: '//' ~('\n'|'\r')* '\r'?'\n' {$channel=HIDDEN;}|'/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;};空格 : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;冒号 : ':' ;逗号 : ',' ;时期  :  '.';OPEN_BRACKET : '(' ;CLOSE_BRACKET : ')' ;OPEN_BRACE : '{' ;CLOSE_BRACE : '}' ;开箱 : '[' ;CLOSE_BOX : ']' ;分段信: 'a'..'z' |'A'..'Z';分段整数:'0'..'9'+;分段漂浮: 整数 + '.'整数*;分段ESC_SEQ: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\');

任何帮助/建议都会很棒.我已经尝试阅读数十篇教程,但似乎没有任何关于 AST 生成的内容:(

解决方案

第 1 步是使树看起来像您发布的小图.现在,您没有任何树构造运算符,因此您最终将得到一个平面列表.

请参阅 antlr.org 网站上的树构造.>

您可以使用 ANTLRWorks 来查看解析树和 AST 的结果.开始添加树构造运算符并观察情况如何变化.

编辑/附加信息:

您可以按照以下流程大致了解如何操作:

  1. 下载 ANTLRWorks 并使用它的绘图工具.您肯定希望在进行更改之前和之后查看解析树和 AST.了解一切工作原理后,您就可以使用任何所需的 IDE 或编辑器.
  2. 树构造有两个基本操作符 - 感叹号 ! 告诉编译器不要将节点放在 AST 中,carot ^,它告诉 ANTLR 使某些东西成为根节点.首先检查每个非终结符规则并决定哪些元素不需要在 AST 中.例如,您不需要逗号或括号.获得所有信息后,您可以填充提供所有信息的结构(或创建您自己的 AST 结构).逗号没有用了,所以给它们添加一个 ! .例如:

    函数:ID(OPEN_BRACKET!(属性(逗号!?属性)*)?CLOSE_BRACKET!)?(OPEN_BRACE!函数* CLOSE_BRACE!)?;

  3. 看看之前和之后的 ANTLRWorks 中的 AST.比较.

  4. 现在决定哪个元素应该是根节点.看起来你想让ID作为根节点,所以在ID后面加一个^,然后在ANTLRWorks中进行比较.

这里有一些变化,使它更接近我认为你想要的:

程序:函数;功能:ID^(OPEN_BRACKET!attributeList?CLOSE_BRACKET!)?(OPEN_BRACE!函数* CLOSE_BRACE!)?;属性列表:(属性(逗号!?属性)*);属性:ID COLON!数据类型;数据类型:NUMBER |字符串 |布尔值 |数组 |抬头 ;数组:OPEN_BOX!(数据类型^(逗号!数据类型)*)?CLOSE_BOX!;查找:OPEN_BRACE!(ID(PERIOD!ID)*)CLOSE_BRACE!;

有了它,现在去看看一些教程.

I'm attemping to learn language parsing for fun...

I've created a ANTLR grammar which I believe will match a simple language I am hoping to implement. It will have the following syntax:

<FunctionName> ( <OptionalArguments>+) {
     <OptionalChildFunctions>+
 }

Actual Example:

ForEach(in:[1,2,3,4,5] as:"nextNumber") {
   Print(message:{nextNumber})
}

I believe I have the grammar working correctly to match this construct, and now I am attemping to build an Abstract Syntax Tree for the language.

Firstly, I must admit I'm not exactly sure HOW this tree should look. Secondly, I'm at a complete loss how to do this in my Antlr grammar...I've been trying without much success for hours.

This is the current idea I'm going with for the tree:

                   FunctionName
                  /          \
           Attributes         \
               / \          /  \ 
            ID    /\    ChildFunctions
           / \   ID etc
          /   \
  Attribute  AttributeValue
        Type

This is my current Antlr grammar file:

grammar Test;

options {output=AST;ASTLabelType=CommonTree;}

program : function ;
function : ID (OPEN_BRACKET (attribute (COMMA? attribute)*)? CLOSE_BRACKET)? (OPEN_BRACE function* CLOSE_BRACE)?;

attribute : ID COLON datatype;

datatype : NUMBER | STRING | BOOLEAN | array | lookup ;
array  :  OPEN_BOX (datatype (COMMA datatype)* )? CLOSE_BOX ;
lookup  : OPEN_BRACE (ID (PERIOD ID)*) CLOSE_BRACE;

NUMBER
 : ('+' | '-')? (INTEGER | FLOAT)
 ;

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

BOOLEAN
 : 'true' | 'TRUE' | 'false' | 'FALSE'
 ;

ID  : (LETTER|'_') (LETTER | INTEGER |'_')*
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

WHITESPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;

COLON : ':' ;
COMMA : ',' ;
PERIOD  :  '.' ;

OPEN_BRACKET : '(' ;
CLOSE_BRACKET : ')' ;

OPEN_BRACE : '{' ; 
CLOSE_BRACE : '}' ;

OPEN_BOX : '[' ;
CLOSE_BOX : ']' ;

fragment
LETTER
 : 'a'..'z' | 'A'..'Z' 
 ;

fragment
INTEGER
 : '0'..'9'+
 ;

fragment
FLOAT
 : INTEGER+ '.' INTEGER*
 ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    ;

ANY help / advice would be great. I've tried reading dozens of tutorials and nothing about the AST generation seems to stick :(

解决方案

Step 1 is to make the tree look like the little graph that you posted. Right now, you don't have any tree construction operators, so you're going to end up with a flat list.

See tree construction on the antlr.org website.

You can use ANTLRWorks to see what your getting for a parse tree and AST. Start adding tree construction operators and watch how things change.

EDIT / Additional Info:

Here's a process you can follow to give you a rough idea of how to do it:

  1. Download ANTLRWorks and use it's graphing facilities. You will definitely want to see the parse tree and the AST before and after you make changes. Once you understand how everything works, then you can use any IDE or editor you want.
  2. There are two basic operators for tree construction - The exclamation mark ! which tells the compiler to not place the node within the AST, and the carot ^, which tells ANTLR to make something the root node. Start by going through each non-terminal rule and deciding which elements don't need to be in the AST. For example, you don't need commas or parenthesis. Once you have all the information you can populate the a structure (or create your own AST structure) that provides all the information. Commas don't help any more, so add a ! to them. For example:

    function: ID (OPEN_BRACKET! (attribute (COMMA!? attribute)*)? CLOSE_BRACKET!)? (OPEN_BRACE! function* CLOSE_BRACE!)?;

  3. Take a look at the AST in ANTLRWorks before and after. Compare.

  4. Now decide which element should be the root node. It looks like you want ID to be the root node, so add a ^ after ID and compare in ANTLRWorks.

Here's a few changes that bring it closer to what I think you want:

program : function ;
function : ID^ (OPEN_BRACKET! attributeList? CLOSE_BRACKET!)? (OPEN_BRACE! function* CLOSE_BRACE!)?;
attributeList:  (attribute (COMMA!? attribute)*);
attribute : ID COLON! datatype;
datatype : NUMBER | STRING | BOOLEAN | array | lookup ;
array  :  OPEN_BOX! (datatype^ (COMMA! datatype)* )? CLOSE_BOX!;
lookup  : OPEN_BRACE! (ID (PERIOD! ID)*) CLOSE_BRACE!;

With that under your belt, now go look at some of the tutorials.

这篇关于如何在 ANTLR3 中制作 TreeParser?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆