Javacc无法访问语句 [英] Javacc Unreachable Statement

查看:372
本文介绍了Javacc无法访问语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的语法中,有原始包含间接左递归的表达式和片段的生成规则。这是我从中删除递归后的规则。

In my grammar there are production rules for expressions and fragments which originally contained indirect left recursion. This is the rules after I removed the recursion from them.

String expression() #Expression : {String number; Token t;}
{
    number = fragment()
    (
        (t = <Mult_Sign> number = fragment())
    )
    {return number;}
}

String fragment() #void : {String t;}
{
    t = identifier() {return t;}
    | t = number() {return t;}
    | (<PLUS> | <MINUS> ) fragment()
    | <LBR> expression() <RBR>
}

这些生产规则在尝试解析语法中的条件时使用。然而,生产规则的排序有它,所以只有表达式被接受。然而它应该接受像while(x <= 10)这样的东西。如果我有相反的顺序的生产规则,如语法中最初所述。当我尝试使用javac编译java文件。我收到一个错误,告诉我identifier()是一个不可达的语句。
这是条件生成规则:

These production rules are used when trying to parse a condition in the grammar. However the ordering of the production rules either has it so only expression is accepted. Yet it should accept something like while (x <= 10). If I have the production rules in the opposite order, as originally stated in the grammar. When I try compile the java file using javac. I receive an error which tells me identifier() is an unreachable statement. This is the condition production rule:

void condition() #void : {Token t;}
{
    <NOT> expression()
    | expression (<EQUALS>|<NOTEQUALS>|<LT>|<GT>|<LTE>|<GTE>|<AND>|<OR>) expression()
    | identifier()
}

如果任何人可以帮助告诉我为什么会出现这个问题,

If anyone could help tell me why this problem is occurring it would be very helpful.

推荐答案

您有

void condition() #void : {Token t;}
{
/*a*/     <NOT> expression()
/*b*/     | expression (<EQUALS>|<NOTEQUALS>|<LT>|<GT>|<LTE>|<GTE>|<AND>|<OR>) expression()
/*c*/     | identifier()
}

如果解析器正在寻找一个条件,基于下一个输入令牌在三个替代方案之间进行选择。如果该令牌是标识符,则存在问题,因为备选方案(b)或备选方案(c)可以工作。面对选择冲突,JavaCC喜欢第一,所以(b)将被选择。如果下一个令牌不是标识符,则不会选择替代(c)。因此,不会达到替代方案(c)。

If the parser is looking for a condition, it will try to make the choice between the three alternatives based on the next token of input. If that token is an identifier, there is a problem, since either alternative (b) or alternative (c) could work. Faced with a choice conflict, JavaCC prefers the first, so (b) will be chosen. And if the next token is not an identifier, then alternative (c) will not be chosen. So either way alternative (c) will not be reached.

这是你的问题。应该做些什么呢?这是通常的解决方案。

That is your problem. What should be done about it? Here is the usual solution.

如果你想在表达式中允许更多的操作符,使更多的非终结符代表更高的优先级。例如

If you want to allow further operators in expressions, make more nonterminals representing more levels of precedence. For example

condition --> expression
expression --> disjunct (OR expression)?
disjunct --> conjunct (AND disjunct)?
conjunct --> comparand ((EQ|NEQ|LT|GT|LE|GE) comparand)?
comparand --> term ((PLUS|MINUS) term)*
term --> fragment ((TIMES | DIVIDE) fragment)*
fragment --> identifier | number | LBR expression RBR | (PLUS|MINUS|NOT) fragment



此语法将接受您想要的一切,例如,如果您有

This grammar will accept everything your want and probably more. For example, if you have

statement --> WHILE condition DO statement

您的解析器将接受例如。 WHILE a + b DO a:= b。在许多语言中,这是由类型检查照顾; Java这样做。在其他语言中,它通过允许所有类型的东西作为条件来处理; LISP执行此操作。

your parser will accept e.g. "WHILE a+b DO a:=b". In many languages this is taken care of by type checking; Java does it this way. In other languages it is dealt with by allowing all sort of things as conditions; LISP does this.

关于NOT的优先级的说明

A note on the precedence of NOT

大多数语言将NOT的优先级视为非常高,如本答案的第二部分所述。这有一个很好的效果,消除所有选择警告,因为语法是LL(1)。

Most languages treat the precedence of NOT as very high, as in the second part of this answer. This has the nice effect of eliminating all choice warnings as the grammar is LL(1).

然而,如果你希望一元运算符有更低的优先级, ,如果使用JavaCC。例如。您可以将片段更改为

However if you want unary operators to have lower precedence there is really nothing stopping you, if you use JavaCC. E.g. you could change fragment to

fragment --> identifier | number | LBR expression RBR | (PLUS|MINUS) fragment | NOT conjunct



现在语法不是LL(1)(它甚至不明确)。所以JavaCC会给出一些选择冲突警告。但它实际上会解析NOT a LT basNOT(a LT b)

Now the grammar is not LL(1) (it's not even unambiguous). So JavaCC will give some choice conflict warnings. But it will actually parse e.g. "NOT a LT b" as "NOT (a LT b)"

你试图做,这是限制语法,以便只有看起来像条件​​的表达式被允许作为条件。如果这是真正你想要的,那么你可以使用JavaCC使用语法lookahead。

What almost no language does is what I think you are trying to do, which is to restrict the syntax so that only expressions that look like conditions are allowed to be conditions. If this is truly what you want, then you can do it with JavaCC using syntactic lookahead. Here is how you do it.

从这个语法开始。 (这实际上是你的想法,更多地关注优先级。)

Start with a grammar like this one. (This is essentially your idea with more attention paid to levels of precedence.)

condition --> disjunct (OR condition)?
disjunct --> conjunct (AND disjunct)?
conjunct --> expression (EQ|NEQ|LT|GT|LE|GE) expression
           | LBR condition RBR
           | NOT conjunct
           | identifier

expression --> term ((PLUS|MINUS) term)*
term --> fragment ((TIMES | DIVIDE) fragment)*
fragment --> identifier | number | LBR expression RBR | (PLUS|MINUS) fragment

这是条件的无歧义语法。然而,当下一个令牌是标识符或LBR时,它在连接处具有选择冲突。为了解决这个选择冲突,你提前使用比较运算符使用句法查找

This is an unambiguous grammar for conditions. However it has a choice conflict at conjunct when the next token is an identifier or an LBR. To resolve this choice conflict you look ahead for the comparison operator using syntactic lookahead thus

void conjunct() : { } {
    LOOKAHEAD( expression() (<EQ>|<NEQ>|<LT>|<GT>|<LE>|<GE>) )
    expression() (<EQ>|<NEQ>|<LT>|<GT>|<LE>|<GE>) expression()
|   LBR condition() RBR
|   NOT conjunct()
|   identifier() {

那么为什么几乎没有编程语言这样做?大多数语言都有布尔类型的变量,所以,像你一样,允许标识符作为条件。所以你仍然需要进行类型检查以排除WHILE i DO ...,其中i不是布尔类型。另外,你应该使用什么语法的赋值?您需要

So why does (almost) no programming language do it this way? Most languages have variables of boolean type and so, like you, allow identifiers as conditions. So you still have to do type checking to rule out "WHILE i DO ..." where "i" is not of boolean type. Also, what should you use for the syntax of assignment? You need

statement --> identifier := (expression | condition) | ...

即使语法先行也不会告诉你x:= y 。这是一个模糊的语法。

Even syntactic lookahead won't tell you which choice is right for "x := y". This is an ambiguous grammar.

如果在两个选择都解析的情况下,任何一个选择都是可以接受的,那么你也可以在这里使用语法lookahead。

If either choice is acceptable in the cases where both choices parse, then you use syntactic lookahead here too.

void statement() : {} {
    identifier <BECOMES> (LOOKAHEAD(condition()) condition()) | expression())
| ...
}

这将解析x:= y作为条件,即使它是数值。如果你知道这一点,并设计编译器的其余部分,所以一切仍然有效,没有危害。

This will parse "y" in "x:=y" as a condition even if it is numeric. If you are aware of this and design the rest of the compiler so everything still works, no harm is done.

这种方法的另一个缺点是解析现在是二次时间理论上。我不认为这是一个严重的问题。

Another disadvantage of this approach is that parsing is now quadratic time in theory. I don't think this is a serious concern.

这篇关于Javacc无法访问语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆