词法分析器中的 ANTLR4 负前瞻 [英] ANTLR4 negative lookahead in lexer
问题描述
我正在尝试为 PostgreSQL SQL 定义词法分析器规则.
I am trying to define lexer rules for PostgreSQL SQL.
问题在于运算符定义和行注释相互冲突.
The problem is with the operator definition and the line comments conflicting with each other.
例如 @---
是操作符标记 @-
后跟 --
注释而不是操作符标记 @---
for example @---
is an operator token @-
followed by the --
comment and not an operator token @---
在 grako
中,可以为 -
片段定义负前瞻,例如:
In grako
it would be possible to define the negative lookahead for the -
fragment like:
OP_MINUS: '-' ! ( '-' ) .
在 ANTLR4 中,我找不到任何方法来回滚已经消耗的片段.
In ANTLR4 I could not find any way to rollback already consumed fragment.
有什么想法吗?
这里是 PostgreSQL 运算符的原始定义:
Here the original definition what the PostgreSQL operator can be:
The operator name is a sequence of up to NAMEDATALEN-1
(63 by default) characters from the following list:
+ - * / < > = ~ ! @ # % ^ & | ` ?
There are a few restrictions on your choice of name:
-- and /* cannot appear anywhere in an operator name,
since they will be taken as the start of a comment.
A multicharacter operator name cannot end in + or -,
unless the name also contains at least one of these
characters:
~ ! @ # % ^ & | ` ?
For example, @- is an allowed operator name, but *- is not.
This restriction allows PostgreSQL to parse SQL-compliant
commands without requiring spaces between tokens.
推荐答案
您可以在词法分析器规则中使用语义谓词来执行前瞻(或后视)而不消耗字符.例如,以下内容涵盖了运算符的几个规则.
You can use a semantic predicate in your lexer rules to perform lookahead (or behind) without consuming characters. For example, the following covers several rules for an operator.
OPERATOR
: ( [+*<>=~!@#%^&|`?]
| '-' {_input.LA(1) != '-'}?
| '/' {_input.LA(1) != '*'}?
)+
;
然而,上述规则并未解决在运算符末尾包含 +
或 -
的限制.为了尽可能以最简单的方式处理这个问题,我可能会将这两种情况分成不同的规则.
However, the above rule does not address the restrictions on including a +
or -
at the end of an operator. To handle that in the easiest way possible, I would probably separate the two cases into separate rules.
// this rule does not allow + or - at the end of a rule
OPERATOR
: ( [*<>=~!@#%^&|`?]
| ( '+'
| '-' {_input.LA(1) != '-'}?
)+
[*<>=~!@#%^&|`?]
| '/' {_input.LA(1) != '*'}?
)+
;
// this rule allows + or - at the end of a rule and sets the type to OPERATOR
// it requires a character from the special subset to appear
OPERATOR2
: ( [*<>=+]
| '-' {_input.LA(1) != '-'}?
| '/' {_input.LA(1) != '*'}?
)*
[~!@#%^&|`?]
OPERATOR?
( '+'
| '-' {_input.LA(1) != '-'}?
)+
-> type(OPERATOR)
;
这篇关于词法分析器中的 ANTLR4 负前瞻的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!