如何在没有回溯的情况下同时进行函数调用和括号分组 [英] How to have both function calls and parenthetical grouping without backtrack

查看:20
本文介绍了如何在没有回溯的情况下同时进行函数调用和括号分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么方法可以指定允许以下语法的语法:

f(x)(g, (1-(-2))*3, 1+2*3)[0]

转换成(以伪lisp显示顺序):

(索引((f x)G(* (- 1 -2) 3)(+ (* 2 3) 1))0)

以及有限的运算符优先级等.

<小时>

以下语法适用于 backtrack = true,但我想避免这种情况:

语法T;选项 {输出=AST;回溯=真;记忆=真;}令牌{称呼;指数;抬头;}编: (expr '\n')* ;表达式:布尔表达式;布尔表达式: relExpr (boolop^ relExpr)?;relExpr: addExpr (relop^ addExpr)?|a=addExpr oa=relop b=addExpr ob=relop c=addExpr->^(土地 ^($oa $a $b) ^($ob $b $c));添加表达式: mulExpr (addop^ mulExpr)?;多表达式: atomExpr (mulop^ atomExpr)?;原子表达式: INT|ID|OPAREN expr CPAREN ->表达式|称呼;称呼: callable ( OPAREN (expr (COMMA expr)*)? CPAREN -> ^(CALL callable expr*)|OBRACK expr CBRACK ->^(INDEX 可调用 expr)|点 ID ->^(INDEX 可调用 ID));分段可调用的: ID|OPAREN expr CPAREN;分段布洛普: 土地 |洛尔;分段重放: (EQ|GT|LT|GTE|LTE);分段插件:(加号|减号);分段穆洛普:(时间|划分);情商:'==';GT : '>';LT : '<';GTE : '>=' ;LTE : '<=' ;土地:'&&';LOR : '||';加号:'+';减 : '-' ;时间:'*';划分 : '/' ;ID : ('a'..'z')+ ;INT:'0'..'9';奥帕伦:'(';CPAREN : ')' ;奥布拉克:'[';CBRACK : ']' ;点:'.';逗号 : ',' ;

解决方案

你的语法有几个问题:

1

只有词法规则可以是 fragments,不能是解析器规则.一些 ANTLR 目标简单地忽略解析器规则前面的片段关键字(如 Java 目标),但最好将它们从语法中删除:如果您决定为不同的目标语言创建解析器,您可能会遇到问题,因为

2

如果没有 backtrack=true,就不能混合使用树重写操作符(^!)重写规则 (->),因为您需要在 relExpr 内创建一个替代方案,而不是您现在拥有的两个替代方案(这是为了消除歧义).

在您的情况下,您不能仅使用 ^(在单个替代方案中)创建所需的 AST,因此您需要这样做:

relExpr: (a=addExpr -> $a) ( (oa=relOp b=addExpr -> ^($oa $a $b))( ob=relOp c=addExpr -> ^(LAND ^($oa $a $b) ^($ob $b $c)))?)?;

(是的,我知道,它不是特别漂亮,但这也无济于事)

另外,如果tokens { ... } 块中定义了LAND 令牌,则只能将其放入重写规则中:

令牌{//文字标记土地='&&';...//假想标记称呼;...}

否则,您只能在重写规则中使用令牌(和其他解析器规则),前提是它们确实出现在解析器规则本身内部.

3

您没有考虑语法中的一元减号,请按如下方式实现:

mulExpr: unaryExpr ((TIMES | DIVIDE)^ unaryExpr)*;一元表达式: 减去 atomExpr ->^(UNARY_MINUS atomExpr)|原子表达式;

<小时>

现在,要创建一个不需要 backtrack=true 的语法,请删除 ID'(' expr ')'你的 atomExpr 规则:

atomExpr: INT|称呼;

并在您的 call 规则中设置所有通过 callable 的可选内容:

调用: (callable -> callable) ( OPAREN params CPAREN -> ^(CALL $call params)|OBRACK expr CBRACK ->^(INDEX $call expr)|点 ID ->^(INDEX $呼叫ID))*;

这样,ID'(' expr ')' 已经被 call 匹配了(并且没有歧义).><小时>

综合以上所有注释,您可以得到以下语法:

语法T;选项 {输出=AST;}令牌{//文字标记情商 = '==' ;GT = '>';LT = '<';GTE = '>=' ;LTE = '<= ;土地 = '&&';LOR = '||';加号 = '+' ;减号 = '-' ;时间 = '*' ;除法 = '/' ;OPAREN = '(' ;CPAREN = ')' ;奥布拉克 = '[' ;CBRACK = ']' ;点 = '.';逗号 = ',' ;//假想标记称呼;指数;抬头;UNARY_MINUS;参数;}编: expr EOF ->表达式;表达式: 布尔表达式;布尔表达式: relExpr ((LAND | LOR)^ relExpr)?;relExpr: (a=addExpr -> $a) ( (oa=relOp b=addExpr -> ^($oa $a $b))( ob=relOp c=addExpr -> ^(LAND ^($oa $a $b) ^($ob $b $c)))?)?;添加表达式: mulExpr ((PLUS | MINUS)^ mulExpr)*;多表达式: unaryExpr ((TIMES | DIVIDE)^ unaryExpr)*;一元表达式: 减去 atomExpr ->^(UNARY_MINUS atomExpr)|原子表达式;原子表达式: INT|称呼;称呼: (callable -> callable) ( OPAREN params CPAREN -> ^(CALL $call params)|OBRACK expr CBRACK ->^(INDEX $call expr)|点 ID ->^(INDEX $呼叫ID))*;可调用的: ID|OPAREN expr CPAREN ->表达式;参数: (expr (COMMA expr)*)?->^(参数表达式*);关系: 情商 |GT |LT |GTE |LTE;ID : 'a'..'z'+ ;INT:'0'..'9'+;空格 : (' ' | '\t') {skip();};

这会将输入 "a >= b < c" 解析为以下 AST:

和输入"f(x)(g, (1-(-2))*3, 1+2*3)[0]"如下:

Is there any way to specify a grammar which allows the following syntax:

f(x)(g, (1-(-2))*3, 1+2*3)[0]

which is transformed into (in pseudo-lisp to show order):

(index 
  ((f x)
    g 
    (* (- 1 -2) 3)
    (+ (* 2 3) 1)
  ) 
  0
)

along with things like limited operator precedence etc.


The following grammar works with backtrack = true, but I'd like to avoid that:

grammar T;

options {
 output=AST;
 backtrack=true;
 memoize=true;
}

tokens {
  CALL;
  INDEX;
  LOOKUP;
}

prog: (expr '\n')* ;

expr : boolExpr;

boolExpr
 : relExpr (boolop^ relExpr)?
 ;

relExpr
 : addExpr (relop^ addExpr)?
 | a=addExpr oa=relop b=addExpr ob=relop c=addExpr
   -> ^(LAND ^($oa $a $b) ^($ob $b $c))
 ;


addExpr
 : mulExpr (addop^ mulExpr)?
 ;

mulExpr
 : atomExpr (mulop^ atomExpr)?
 ;

atomExpr
 : INT
 | ID
 | OPAREN expr CPAREN -> expr
 | call
 ;

call
 : callable ( OPAREN (expr (COMMA expr)*)? CPAREN -> ^(CALL callable expr*)
            | OBRACK expr CBRACK -> ^(INDEX callable expr)
            | DOT ID -> ^(INDEX callable ID)
            )
 ;

fragment
callable
 : ID
 | OPAREN expr CPAREN
 ;

fragment
boolop
 : LAND | LOR
 ;

fragment
relop
 : (EQ|GT|LT|GTE|LTE)
 ;

fragment
addop
 : (PLUS|MINUS)
 ;

fragment
mulop
 : (TIMES|DIVIDE)
 ;

EQ : '==' ;
GT : '>' ;
LT : '<' ;
GTE : '>=' ;
LTE : '<=' ;

LAND : '&&' ;
LOR : '||' ;

PLUS : '+' ;
MINUS : '-' ;
TIMES : '*' ;
DIVIDE : '/' ;

ID : ('a'..'z')+ ;
INT : '0'..'9' ;

OPAREN : '(' ;
CPAREN : ')' ;
OBRACK : '[' ;
CBRACK : ']' ;
DOT : '.' ;

COMMA : ',' ;

解决方案

There are a couple of things wrong with your grammar:

1

Only lexer rules can be fragments, not parser rules. Some ANTLR targets simply ignore the fragment keyword in front of parser rules (like the Java target), but better just remove them from your grammar: if you decide to create a parser for a different target-language, you may run into problems because of it.

2

Without the backtrack=true, you cannot mix tree-rewrite operators (^ and !) and rewrite rules (->) because you need to create a single alternative inside relExpr instead of the two alternatives you now have (this is to eliminate an ambiguity).

In your case, you can't create the desired AST with just ^ (inside a single alternative), so you'll need to do it like this:

relExpr
 : (a=addExpr -> $a) ( (oa=relOp b=addExpr    -> ^($oa $a $b))
                         ( ob=relOp c=addExpr -> ^(LAND ^($oa $a $b) ^($ob $b $c))
                         )?
                     )?
 ;

(yes, I know, it's not particularly pretty, but that can't be helped AFAIK)

Also, you can only put the LAND token in the rewrite rules if it is defined in the tokens { ... } block:

tokens {
  // literal tokens
  LAND='&&';
  ...

  // imaginary tokens
  CALL;
  ...
}

Otherwise you can only use tokens (and other parser rules) in rewrite rules if they really occur inside the parser rule itself.

3

You did not account for the unary minus in your grammar, implement it like this:

mulExpr
 : unaryExpr ((TIMES | DIVIDE)^ unaryExpr)*
 ;

unaryExpr
 : MINUS atomExpr -> ^(UNARY_MINUS atomExpr)
 | atomExpr
 ;


Now, to create a grammar that does not need backtrack=true, remove the ID and '(' expr ')' from your atomExpr rule:

atomExpr
 : INT
 | call
 ;

and make everything passed callable optional inside your call rule:

call
 : (callable -> callable) ( OPAREN params CPAREN -> ^(CALL $call params)
                          | OBRACK expr CBRACK   -> ^(INDEX $call expr)
                          | DOT ID               -> ^(INDEX $call ID)
                          )*
 ;

That way, ID and '(' expr ')' are already matched by call (and there's no ambiguity).


Taken all the remarks above into account, you could get the following grammar:

grammar T;

options {
  output=AST;
}

tokens {
 // literal tokens
 EQ     = '==' ;
 GT     = '>' ;
 LT     = '<' ;
 GTE    = '>=' ;
 LTE    = '<=' ;
 LAND   = '&&' ;
 LOR    = '||' ;
 PLUS   = '+' ;
 MINUS  = '-' ;
 TIMES  = '*' ;
 DIVIDE = '/' ;
 OPAREN = '(' ;
 CPAREN = ')' ;
 OBRACK = '[' ;
 CBRACK = ']' ;
 DOT    = '.' ;
 COMMA  = ',' ;

 // imaginary tokens
 CALL;
 INDEX;
 LOOKUP;
 UNARY_MINUS;
 PARAMS;
}

prog
 : expr EOF -> expr
 ;

expr
 : boolExpr
 ;

boolExpr
 : relExpr ((LAND | LOR)^ relExpr)?
 ;

relExpr
 : (a=addExpr -> $a) ( (oa=relOp b=addExpr    -> ^($oa $a $b))
                         ( ob=relOp c=addExpr -> ^(LAND ^($oa $a $b) ^($ob $b $c))
                         )?
                     )?
 ;

addExpr
 : mulExpr ((PLUS | MINUS)^ mulExpr)*
 ;

mulExpr
 : unaryExpr ((TIMES | DIVIDE)^ unaryExpr)*
 ;

unaryExpr
 : MINUS atomExpr -> ^(UNARY_MINUS atomExpr)
 | atomExpr
 ;

atomExpr
 : INT
 | call
 ;

call
 : (callable -> callable) ( OPAREN params CPAREN -> ^(CALL $call params)
                          | OBRACK expr CBRACK   -> ^(INDEX $call expr)
                          | DOT ID               -> ^(INDEX $call ID)
                          )*
 ;

callable
 : ID
 | OPAREN expr CPAREN -> expr
 ;

params
 : (expr (COMMA expr)*)? -> ^(PARAMS expr*)
 ;

relOp
 : EQ | GT | LT | GTE | LTE
 ;

ID     : 'a'..'z'+ ;
INT    : '0'..'9'+ ;
SPACE  : (' ' | '\t') {skip();};

which would parse the input "a >= b < c" into the following AST:

and the input "f(x)(g, (1-(-2))*3, 1+2*3)[0]" as follows:

这篇关于如何在没有回溯的情况下同时进行函数调用和括号分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆