如何解决歧义 [英] how to resolve an ambiguity

查看:37
本文介绍了如何解决歧义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个语法:

grammar Test;

s      : ID OP (NUMBER | ID);

ID     : [a-z]+ ;
NUMBER : '.'? [0-9]+ ;

OP     : '/.' | '/' ;
WS     : [ \t\r\n]+ -> skip ;

x/.123 这样的表达式既可以解析为 (sx/. 123),也可以解析为 (sx/.123).通过上面的语法,我得到了第一个变体.

An expression like x/.123 can either be parsed as (s x /. 123), or as (s x / .123). With the grammar above I get the first variant.

有没有办法得到两个解析树?有没有办法控制它的解析方式?比如说,如果 /. 后面有一个数字,那么我会发出 / 否则我会在树中发出 /. .

Is there a way to get both parse trees? Is there a way to control how it is parsed? Say, if there is a number after the /. then I emit the / otherwise I emit /. in the tree.

我是 ANTLR 的新手.

I am new to ANTLR.

推荐答案

像 x/.123 这样的表达式既可以解析为 (s x/. 123),也可以解析为 (s x/.123)

An expression like x/.123 can either be parsed as (s x /. 123), or as (s x / .123)

我不确定.在 ReplaceAll 页面(*),可能的问题段落中,据说期间与数字的结合比与斜线的结合更强烈",因此 /.123 将始终被解释为除法运算编号 .123.接下来据说为了避免这个问题,必须在 /. 运算符和数字之间的输入中插入一个空格,如果您希望它被理解为替换.

I'm not sure. In the ReplaceAll page(*), Possible Issues paragraph, it is said that "Periods bind to numbers more strongly than to slash", so that /.123 will always be interpreted as a division operation by the number .123. Next it is said that to avoid this issue, a space must be inserted in the input between the /. operator and the number, if you want it to be understood as a replacement.

所以只有一种可能的解析树(否则 Wolfram 解析器如何决定如何解释语句?).

So there is only one possible parse tree (otherwise how could the Wolfram parser decide how to interpret the statement ?).

ANTLR4 词法分析器和解析器是贪婪的.这意味着词法分析器(解析器)尝试在匹配规则时读取尽可能多的输入字符(标记).使用您的 OP 规则 OP : '/.'|'/' ; 词法分析器将始终将输入 /. 匹配到 /. 替代项(即使规则是 OP : '/'| '/.' ;).这意味着没有歧义,您没有机会将输入解释为 OP=/和 NUMBER=.123.

ANTLR4 lexer and parser are greedy. It means that the lexer (parser) tries to read as much input characters (tokens) that it can while matching a rule. With your OP rule OP : '/.' | '/' ; the lexer will always match the input /. to the /. alternative (even if the rule is OP : '/' | '/.' ;). This means there is no ambiguity and you have no chance the input to be interpreted as OP=/ and NUMBER=.123.

鉴于我在 ANTLR 方面的小经验,除了将 ReplaceAll 运算符拆分为两个标记之外,我没有找到其他解决方案.

Given my small experience with ANTLR, I have found no other solution than to split the ReplaceAll operator into two tokens.

语法问题.g4 :

grammar Question;

/* Parse Wolfram ReplaceAll. */

question
@init {System.out.println("Question last update 0851");}
    :   s+ EOF
    ;

s   :   division
    |   replace_all
    ;

division
    :   expr '/' NUMBER
        {System.out.println("found division " + $expr.text + " by " + $NUMBER.text);}
    ;

replace_all
    :   expr '/' '.' replacement
        {System.out.println("found ReplaceAll " + $expr.text + " with " + $replacement.text);}
    ;

expr
    :   ID
    |   '"' ID '"'
    |   NUMBER
    |   '{' expr ( ',' expr )* '}'
    ;

replacement
    :   expr '->' expr    
    |   '{' replacement ( ',' replacement )* '}'
    ;

ID     : [a-z]+ ;
NUMBER : '.'? [0-9]+ ;
WS     : [ \t\r\n]+ -> skip ;

输入文件 t.text :

Input file t.text :

x/.123
x/.x -> 1
{x, y}/.{x -> 1, y -> 2}
{0, 1}/.0 -> "zero"
{0, 1}/. 0 -> "zero"

执行:

$ export CLASSPATH=".:/usr/local/lib/antlr-4.6-complete.jar"
$ alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
$ alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Question.g4 
$ javac Q*.java
$ grun Question question -tokens -diagnostics t.text 
[@0,0:0='x',<ID>,1:0]
[@1,1:1='/',<'/'>,1:1]
[@2,2:5='.123',<NUMBER>,1:2]
[@3,7:7='x',<ID>,2:0]
[@4,8:8='/',<'/'>,2:1]
[@5,9:9='.',<'.'>,2:2]
[@6,10:10='x',<ID>,2:3]
[@7,12:13='->',<'->'>,2:5]
[@8,15:15='1',<NUMBER>,2:8]
[@9,17:17='{',<'{'>,3:0]
...
[@29,47:47='}',<'}'>,4:5]
[@30,48:48='/',<'/'>,4:6]
[@31,49:50='.0',<NUMBER>,4:7]
...
[@40,67:67='}',<'}'>,5:5]
[@41,68:68='/',<'/'>,5:6]
[@42,69:69='.',<'.'>,5:7]
[@43,71:71='0',<NUMBER>,5:9]
...
[@48,83:82='<EOF>',<EOF>,6:0]
Question last update 0851
found division x by .123
found ReplaceAll x with x->1
found ReplaceAll {x,y} with {x->1,y->2}
found division {0,1} by .0
line 4:10 extraneous input '->' expecting {<EOF>, '"', '{', ID, NUMBER}
found ReplaceAll {0,1} with 0->"zero"

输入 x/.123 在斜线之前是不明确的.然后解析器有两个选择: /NUMBER 在划分规则或 /.expr 在 replace_all 规则中.我认为 NUMBER 吸收了输入,因此不再有歧义.

The input x/.123 is ambiguous until the slash. Then the parser has two choices : / NUMBER in the division rule or / . expr in the replace_all rule. I think that NUMBER absorbs the input and so there is no more ambiguity.

(*) 链接是昨天在已消失的评论中,即 Wolfram Language&系统,全部替换

(*) the link was yesterday in a comment that has disappeared, i.e. Wolfram Language & System, ReplaceAll

这篇关于如何解决歧义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆