Antlr 3关键字和标识符冲突 [英] Antlr 3 keywords and identifiers colliding

查看：74 发布时间：2021/4/24 19:38:08 tokenize antlr3 context-free-grammar

本文介绍了Antlr 3关键字和标识符冲突的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

惊喜，我正在为项目构建类似SQL的语言解析器.

Surprise, I am building an SQL like language parser for a project.

我主要使用它，但是当我开始对它的实际请求进行测试时，它将可以处理，我意识到它的内部行为与我所想的不同.

I had it mostly working, but when I started testing it against real requests it would be handling, I realized it was behaving differently on the inside than I thought.

以下语法的主要问题是，我为语言关键字" pct_within "定义了词法分析器规则 PCT_WITHIN .这可以正常工作，但是如果我尝试匹配" attributes.pct_vac "之类的字段，则该字段的文本为" attributes.ac "，并且出现非常明显的ANTLR错误的:

The main issue in the following grammar is that I define a lexer rule PCT_WITHIN for the language keyword 'pct_within'. This works fine, but if I try to match a field like 'attributes.pct_vac', I get the field having text of 'attributes.ac' and a pretty ANTLR error of:

line 1:15 mismatched character u'v' expecting 'c'

语法

grammar Select;

options {
  language=Python;
}

eval returns [value]
    : field EOF 
    ;

field returns [value]
    : fieldsegments {print $field.text}
    ;

fieldsegments
    : fieldsegment (DOT (fieldsegment))*
    ;

fieldsegment
    : ICHAR+ (USCORE ICHAR+)*
    ;

WS                      : ('\t' | ' ' | '\r' | '\n')+ {self.skip();};

ICHAR                   : ('a'..'z'|'A'..'Z');

PCT_CONTAINS            : 'pct_contains';

USCORE                  : '_';
DOT                     : '.';

我一直在阅读关于该主题的所有文章.Lexer在发现错误时如何消耗它们，即使它是错误的.如何使用语义谓词消除歧义/如何使用超前方式.但是我阅读的所有内容都无法帮助我解决此问题.

I have been reading everything I can find on the topic. How the Lexer consumes stuff as it finds it even if it is wrong. How you can use semantic predication to remove ambiguity/how to use lookahead. But everything I read hasn't helped me fix this issue.

老实说，我什至不知道这怎么可能是一个问题.我一定会错过一些非常明显的东西，因为我看到的其他语法具有 EXISTS 之类的Lexer规则，但这不会导致解析器采用诸如' existsOrNot '之类的字符串并吐出和 IDENTIFIER ，其文字为" rNot ".

Honestly I don't see how it even CAN be an issue. I must be missing something super obvious because other grammars I see have Lexer rules like EXISTS but that doesn't cause the parser to take a string like 'existsOrNot' and spit out and IDENTIFIER with the text of 'rNot'.

我缺少什么或做错了什么?

What am I missing or doing completely wrong?

推荐答案

将您的fieldegment解析器规则转换为词法分析器规则.就目前而言，它将接受类似

Convert your fieldsegment parser rule into a lexer rule. As it stands now it will accept input like

"abc      
_     abc"

这可能不是您想要的.关键字"pct_contains"不会与该规则匹配，因为它是单独定义的.如果要按特定顺序将关键字作为常规标识符接受，则必须将其包括在已接受的标识符规则中.

which is probably not what you want. The keyword "pct_contains" won't be matched by this rule since it is defined separately. If you want to accept the keyword in certain sequences as regular identifier you will have to include it in the accepted identifier rule.

这篇关于Antlr 3关键字和标识符冲突的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Antlr 3关键字和标识符冲突 [英] Antlr 3 keywords and identifiers colliding

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Antlr 3关键字和标识符冲突 [英] Antlr 3 keywords and identifiers colliding

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭