Lexer中的ANTLR4否定超前 [英] ANTLR4 negative lookahead in lexer

查看：93 发布时间：2020/9/3 0:07:54 antlr4

本文介绍了Lexer中的ANTLR4否定超前的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试为PostgreSQL SQL定义词法分析器规则.

I am trying to define lexer rules for PostgreSQL SQL.

问题在于运算符定义和行注释相互冲突.

The problem is with the operator definition and the line comments conflicting with each other.

例如@---是运算符令牌@-，后跟--注释，而不是运算符令牌@---

for example @--- is an operator token @- followed by the -- comment and not an operator token @---

在grako中，可以为-片段定义负前瞻，例如:

In grako it would be possible to define the negative lookahead for the - fragment like:

OP_MINUS: '-' ! ( '-' ) .

在ANTLR4中，我找不到任何方法来回滚已经消耗的片段.

In ANTLR4 I could not find any way to rollback already consumed fragment.

有什么想法吗?

原始定义是PostgreSQL运算符可以是什么:

Here the original definition what the PostgreSQL operator can be:

The operator name is a sequence of up to NAMEDATALEN-1
(63 by default) characters from the following list:

 + - * / < > = ~ ! @ # % ^ & | ` ?

There are a few restrictions on your choice of name:
-- and /* cannot appear anywhere in an operator name,
since they will be taken as the start of a comment.

A multicharacter operator name cannot end in + or -,
unless the name also contains at least one of these
characters:

~ ! @ # % ^ & | ` ?

For example, @- is an allowed operator name, but *- is not.
This restriction allows PostgreSQL to parse SQL-compliant
commands without requiring spaces between tokens.

推荐答案

您可以在词法分析器规则中使用语义谓词来执行先行(或后行)操作，而不消耗字符.例如，以下内容涵盖了操作员的一些规则.

You can use a semantic predicate in your lexer rules to perform lookahead (or behind) without consuming characters. For example, the following covers several rules for an operator.

OPERATOR
  : ( [+*<>=~!@#%^&|`?]
    | '-' {_input.LA(1) != '-'}?
    | '/' {_input.LA(1) != '*'}?
    )+
  ;

但是，以上规则并未解决在运算符末尾包含+或-的限制.为了尽可能简单地处理此问题，我可能会将这两种情况分成单独的规则.

However, the above rule does not address the restrictions on including a + or - at the end of an operator. To handle that in the easiest way possible, I would probably separate the two cases into separate rules.

// this rule does not allow + or - at the end of a rule
OPERATOR
  : ( [*<>=~!@#%^&|`?]
    | ( '+'
      | '-' {_input.LA(1) != '-'}?
      )+
      [*<>=~!@#%^&|`?]
    | '/' {_input.LA(1) != '*'}?
    )+
  ;

// this rule allows + or - at the end of a rule and sets the type to OPERATOR
// it requires a character from the special subset to appear
OPERATOR2
  : ( [*<>=+]
    | '-' {_input.LA(1) != '-'}?
    | '/' {_input.LA(1) != '*'}?
    )*
    [~!@#%^&|`?]
    OPERATOR?
    ( '+'
    | '-' {_input.LA(1) != '-'}?
    )+
    -> type(OPERATOR)
  ;

这篇关于Lexer中的ANTLR4否定超前的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Lexer中的ANTLR4否定超前 [英] ANTLR4 negative lookahead in lexer

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Lexer中的ANTLR4否定超前 [英] ANTLR4 negative lookahead in lexer

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭