JavaCC警告“正则表达式选择:FOO永远不能匹配为:BAR"的解释和解决方案? [英] Explanation and solution for JavaCC's warning "Regular expression choice : FOO can never be matched as : BAR"?

查看:102
本文介绍了JavaCC警告“正则表达式选择:FOO永远不能匹配为:BAR"的解释和解决方案?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在自学在一个业余项目中使用JavaCC,并有一个简单的语法来为其编写解析器.解析器的一部分包括以下内容:

I am teaching myself to use JavaCC in a hobby project, and have a simple grammar to write a parser for. Part of the parser includes the following:

TOKEN : { < DIGIT : (["0"-"9"]) > }
TOKEN : { < INTEGER : (<DIGIT>)+ > }
TOKEN : { < INTEGER_PAIR : (<INTEGER>){2} > }
TOKEN : { < FLOAT : (<NEGATE>)? <INTEGER> | (<NEGATE>)? <INTEGER>  "." <INTEGER>  | (<NEGATE>)? <INTEGER> "." | (<NEGATE>)? "." <INTEGER> > } 
TOKEN : { < FLOAT_PAIR : (<FLOAT>){2} > }
TOKEN : { < NUMBER_PAIR : <FLOAT_PAIR> | <INTEGER_PAIR> > }
TOKEN : { < NEGATE : "-" > }

使用JavaCC编译时,我得到输出:

When compiling with JavaCC I get the output:

Warning: Regular Expression choice : FLOAT_PAIR can never be matched as : NUMBER_PAIR

Warning: Regular Expression choice : INTEGER_PAIR can never be matched as : NUMBER_PAIR

我确信这是一个简单的概念,但我不理解警告,因为它既是解析器生成又是正则表达式的新手.

I'm sure this is a simple concept but I don't understand the warning, being a novice in both parser generation and regular expressions.

此警告是什么意思(按您可以尝试的新术语)?

What does this warning mean (in as-novice-as-you-can-get terms)?

推荐答案

我不知道JavaCC,但是我是编译器工程师.

I don't know JavaCC, but I am a compiler engineer.

FLOAT_PAIR规则不明确.考虑以下文本:

The FLOAT_PAIR rule is ambiguous. Consider the following text:

0.0

可以是FLOAT 0,后跟FLOAT .0;或者可以是FLOAT 0.,后跟FLOAT 0;两者都导致FLOAT_PAIR.或者它可以是单个FLOAT 0.0.

This could be FLOAT 0 followed by FLOAT .0; or it could be FLOAT 0. followed by FLOAT 0; both resulting in FLOAT_PAIR. Or it could be a single FLOAT 0.0.

但是,更重要的是,您正在以一种不可能的方式将词法分析与组合一起使用.考虑这个数字:

More importantly, though, you are using lexical analysis with composition in a way that is never likely to work. Consider this number:

12345

可以将其解析为INTEGER 12, INTEGER 345,从而生成INTEGER_PAIR.或者可以将其解析为INTEGER 123, INTEGER 45,另一个INTEGER_PAIR.也可以是另一个令牌INTEGER 12345.之所以存在此问题,是因为您不需要INTEGER_PAIR(或FLOAT_PAIR)的词法元素之间的空格.

This could be parsed as INTEGER 12, INTEGER 345 resulting in an INTEGER_PAIR. Or it could be parsed as INTEGER 123, INTEGER 45, another INTEGER_PAIR. Or it could be INTEGER 12345, another token. The problem exists because you are not requiring white space between the lexical elements of the INTEGER_PAIR (or FLOAT_PAIR).

几乎不应该尝试在词法分析器中处理这样的对.相反,您应该将纯数字(INTEGERFLOAT)作为标记处理,并在解析器中处理否定和配对之类的东西,在其中处理和去除了空格.

You should almost never try to handle pairs like this in the lexer. Instead, you should handle plain numbers (INTEGER and FLOAT) as tokens, and handle things like negation and pairing in the parser, where whitespace has been dealt with and stripped.

(例如,您将如何处理"----42"?这是大多数编程语言中的有效表达式,它将正确计算多个取反,但不会由您的词法分析器处理.)

(For example, how are you going to process "----42"? This is a valid expression in most programming languages, which will correctly calculate multiple negations, but would not be handled by your lexer.)

此外,请注意,词法分析器中的一位整数不会与INTEGER匹配,它们会以DIGIT的形式出现.但是,我不知道JavaCC可以为您解决该问题的正确语法.您想要的不是将DIGIT定义为令牌,而只是将其定义为其他令牌的定义即可.或者,将DIGIT([0-9])的定义直接嵌入到您在规则中使用DIGIT的位置.

Also, be aware that single-digit integers in your lexer will not be matched as INTEGER, they will come out as DIGIT. I don't know the correct syntax for JavaCC to fix that for you, though. What you want is to define DIGIT not as a token, but simply something you can use in the definitions of other tokens; alternatively, embed the definition of DIGIT ([0-9]) directly wherever you are using DIGIT in your rules.

这篇关于JavaCC警告“正则表达式选择:FOO永远不能匹配为:BAR"的解释和解决方案?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆