解析括号作为原子ANTLR [英] Parse Parenthesis as atoms ANTLR

查看:208
本文介绍了解析括号作为原子ANTLR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试匹配平衡括号,以便在匹配时创建一个 PARAMS 树,否则LPARAM和RPARAM令牌只是简单地添加为原子树...

  tokens 
{
LIST;
PARAMS;
}

开始:列表 - > ^(LIST列表);

list:(expr | atom)+;

expr:LPARAM列表? RPARAM - > ^(PARAMS列表?);

atom:INT | LPARAM | RPARAM;

INT:'0'..'9'+;
LPARAM:'(';
RPARAM:')';

此刻,它永远不会创建一个 PARAMS 树,因为在规则expr中,它将始终将最终 RPARAM 作为原子,而不是该规则的关闭标记。



所以目前,像$ code> 1 2 3(4)5 这样的东西被添加到 LIST 树作为一个平面的令牌列表,而不是所需的分组。



我已经处理过将原子添加到树中,但是它们从来没有要开始另一个规则,因为 LPARAM 在这里。



我需要某种语法/语义谓词吗?

解决方案

这是一个简单的方法,有几个约束。我认为这些符合您在评论中提到的预期行为。




  • 无与伦比的 LPARAM 不会出现在子列表中

  • 无法匹配的 RPARAM 不会出现在子列表中



语法:

  start:root + EOF  - > ^(LIST root +); 

root:expr
| LPARAM
| RPARAM
;

expr:list
| atom
;

list:LPARAM expr + RPARAM - > ^(LIST expr +)
;

atom:INT
;

规则匹配不匹配 LPARAM s和 RPARAM s。规则列表 atom 只关心自己。



该解决方案相对脆弱,因为规则 root 需要 expr 列在 LPARAM RPARAM 。即使如此,也许这足以解决您的问题。



测试用例1:无列表



输入: 1 2 3



输出:





测试用例2:一个列表



输入: 1(2)3



输出: p>



测试用例3:两个列表



输入:(1)2(3)输出:





测试用例4:没有列表,不匹配的左侧



输入:((1 2 3



输出:





测试用例输入:((1(2)(3))



输出:





测试用例6:没有列表,不匹配的权限



输入: 1 2 3))



输出:





测试用例7:两个列表,不匹配的权限



输入:(1)(2)3 ))



输出:





测试用例8:两个列表,混合不匹配的左侧



输入:((1(2)((3))



输出:



测试用例9:两个列表,混合不匹配的权限



输入:(1))(2)3))



输出:








更复杂的语法歌剧在 [] ()对之间。我认为解决方案会在你增加成对时变得越来越差,但是,这很有趣!您可能还会遇到语法驱动的AST建筑的局限性。

  start:root + EOF  - > ^(LIST root +)
;

root:expr
| LPARAM
| RPARAM
| LSQB
| RSQB
;
expr:plist
| slist
| atom
;

plist:LPARAM pexpr * RPARAM - > ^(LIST pexpr *)
;

pexpr:slist
| atom
| LSQB
| RSQB
;

slist:LSQB sexpr * RSQB - > ^(LIST sexpr *)
;

sexpr:plist
| atom
| LPARAM
| RPARAM
;

atom:INT;

INT:('0'..'9')+;
LPARAM:'(';
RPARAM:')';
LSQB:'[';
RSQB:']';


I'm trying to match balanced parentheses such that, a PARAMS tree is created if a match is made, else the LPARAM and RPARAM tokens are simply added as atoms to the tree...

tokens
{
    LIST;    
    PARAMS;
}

start   : list -> ^(LIST list);

list    : (expr|atom)+;

expr : LPARAM list? RPARAM -> ^(PARAMS list?);

atom :  INT | LPARAM | RPARAM;

INT :   '0'..'9'+;
LPARAM  :   '(';
RPARAM  :   ')';

At the moment, it will never create a PARAMS tree, because in the rule expr it will always see the end RPARAM as an atom, rather than the the closing token for that rule.

So at the moment, something like 1 2 3 (4) 5 is added to a LIST tree as a flat list of tokens, rather than the required grouping.

I've handled adding tokens as atoms to a tree before, but they never were able to start another rule, as LPARAM does here.

Do I need some sort of syntatic/semantic predicate here?

解决方案

Here is a simple approach that comes with a couple of constraints. I think these conform to the expected behavior that you mentioned in the comments.

  • An unmatched LPARAM never appears inside a child list
  • An unmatched RPARAM never appears inside a child list

Grammar:

start   : root+ EOF -> ^(LIST root+ );

root    : expr
        | LPARAM
        | RPARAM
        ;

expr    : list
        | atom
        ;           

list    : LPARAM expr+ RPARAM -> ^(LIST expr+)
        ;

atom    : INT
        ;

Rule root matches mismatched LPARAMs and RPARAMs. Rules list and atom only care about themselves.

This solution is relatively fragile because rule root requires expr to be listed before LPARAM and RPARAM. Even so, maybe this is enough to solve your problem.

Test case 1 : no lists

Input: 1 2 3

Output:

Test case 2 : one list

Input: 1 (2) 3

Output:

Test case 3 : two lists

Input: (1) 2 (3)

Output:

Test case 4 : no lists, mismatched lefts

Input: ((1 2 3

Output:

Test case 5 : two lists, mismatched lefts

Input: ((1 (2) (3)

Output:

Test case 6 : no lists, mismatched rights

Input: 1 2 3))

Output:

Test case 7 : two lists, mismatched rights

Input: (1) (2) 3))

Output:

Test case 8 : two lists, mixed mismatched lefts

Input: ((1 (2) ( (3)

Output:

Test case 9 : two lists, mixed mismatched rights

Input: (1) ) (2) 3))

Output:


Here's a slightly more complicated grammar that operates on [] and () pairs. I think the solution is going to get exponentially worse as you add pairs, but hey, it's fun! You may also be hitting the limitation of what you can do with grammar-driven AST building.

start   : root+ EOF -> ^(LIST root+ )
        ;

root    : expr
        | LPARAM
        | RPARAM
        | LSQB
        | RSQB
        ;       
expr    : plist
        | slist
        | atom
        ;           

plist   : LPARAM pexpr* RPARAM -> ^(LIST pexpr*)
        ;

pexpr   : slist
        | atom
        | LSQB
        | RSQB
        ;       

slist   : LSQB sexpr* RSQB -> ^(LIST sexpr*)
        ;

sexpr   : plist
        | atom
        | LPARAM
        | RPARAM
        ;               

atom    : INT;

INT     : ('0'..'9')+;
LPARAM  : '(';
RPARAM  : ')';
LSQB    : '[';
RSQB    : ']';

这篇关于解析括号作为原子ANTLR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆