JavaCC警告“正则表达式选择:FOO永远不能匹配为:BAR"的解释和解决方案? [英] Explanation and solution for JavaCC's warning "Regular expression choice : FOO can never be matched as : BAR"?
问题描述
我正在自学在一个业余项目中使用JavaCC,并有一个简单的语法来为其编写解析器.解析器的一部分包括以下内容:
I am teaching myself to use JavaCC in a hobby project, and have a simple grammar to write a parser for. Part of the parser includes the following:
TOKEN : { < DIGIT : (["0"-"9"]) > }
TOKEN : { < INTEGER : (<DIGIT>)+ > }
TOKEN : { < INTEGER_PAIR : (<INTEGER>){2} > }
TOKEN : { < FLOAT : (<NEGATE>)? <INTEGER> | (<NEGATE>)? <INTEGER> "." <INTEGER> | (<NEGATE>)? <INTEGER> "." | (<NEGATE>)? "." <INTEGER> > }
TOKEN : { < FLOAT_PAIR : (<FLOAT>){2} > }
TOKEN : { < NUMBER_PAIR : <FLOAT_PAIR> | <INTEGER_PAIR> > }
TOKEN : { < NEGATE : "-" > }
使用JavaCC编译时,我得到输出:
When compiling with JavaCC I get the output:
Warning: Regular Expression choice : FLOAT_PAIR can never be matched as : NUMBER_PAIR
Warning: Regular Expression choice : INTEGER_PAIR can never be matched as : NUMBER_PAIR
我确信这是一个简单的概念,但我不理解警告,因为它既是解析器生成又是正则表达式的新手.
I'm sure this is a simple concept but I don't understand the warning, being a novice in both parser generation and regular expressions.
此警告是什么意思(按您可以尝试的新术语)?
What does this warning mean (in as-novice-as-you-can-get terms)?
推荐答案
我不知道JavaCC,但是我是编译器工程师.
I don't know JavaCC, but I am a compiler engineer.
FLOAT_PAIR
规则不明确.考虑以下文本:
The FLOAT_PAIR
rule is ambiguous. Consider the following text:
0.0
可以是FLOAT 0
,后跟FLOAT .0
;或者可以是FLOAT 0.
,后跟FLOAT 0
;两者都导致FLOAT_PAIR.或者它可以是单个FLOAT 0.0
.
This could be FLOAT 0
followed by FLOAT .0
; or it could be FLOAT 0.
followed by FLOAT 0
; both resulting in FLOAT_PAIR. Or it could be a single FLOAT 0.0
.
但是,更重要的是,您正在以一种不可能的方式将词法分析与组合一起使用.考虑这个数字:
More importantly, though, you are using lexical analysis with composition in a way that is never likely to work. Consider this number:
12345
可以将其解析为INTEGER 12, INTEGER 345
,从而生成INTEGER_PAIR
.或者可以将其解析为INTEGER 123, INTEGER 45
,另一个INTEGER_PAIR
.也可以是另一个令牌INTEGER 12345
.之所以存在此问题,是因为您不需要INTEGER_PAIR
(或FLOAT_PAIR
)的词法元素之间的空格.
This could be parsed as INTEGER 12, INTEGER 345
resulting in an INTEGER_PAIR
. Or it could be parsed as INTEGER 123, INTEGER 45
, another INTEGER_PAIR
. Or it could be INTEGER 12345
, another token. The problem exists because you are not requiring white space between the lexical elements of the INTEGER_PAIR
(or FLOAT_PAIR
).
几乎不应该尝试在词法分析器中处理这样的对.相反,您应该将纯数字(INTEGER
和FLOAT
)作为标记处理,并在解析器中处理否定和配对之类的东西,在其中处理和去除了空格.
You should almost never try to handle pairs like this in the lexer. Instead, you should handle plain numbers (INTEGER
and FLOAT
) as tokens, and handle things like negation and pairing in the parser, where whitespace has been dealt with and stripped.
(例如,您将如何处理"----42"
?这是大多数编程语言中的有效表达式,它将正确计算多个取反,但不会由您的词法分析器处理.)
(For example, how are you going to process "----42"
? This is a valid expression in most programming languages, which will correctly calculate multiple negations, but would not be handled by your lexer.)
此外,请注意,词法分析器中的一位整数不会与INTEGER
匹配,它们会以DIGIT
的形式出现.但是,我不知道JavaCC可以为您解决该问题的正确语法.您想要的不是将DIGIT
定义为令牌,而只是将其定义为其他令牌的定义即可.或者,将DIGIT
([0-9]
)的定义直接嵌入到您在规则中使用DIGIT
的位置.
Also, be aware that single-digit integers in your lexer will not be matched as INTEGER
, they will come out as DIGIT
. I don't know the correct syntax for JavaCC to fix that for you, though. What you want is to define DIGIT
not as a token, but simply something you can use in the definitions of other tokens; alternatively, embed the definition of DIGIT
([0-9]
) directly wherever you are using DIGIT
in your rules.
这篇关于JavaCC警告“正则表达式选择:FOO永远不能匹配为:BAR"的解释和解决方案?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!