ANTLR 解析字符串(保留空格)并解析普通标识符 [英] ANTLR parse strings (keep whitespaces) and parse normal identifiers

查看:27
本文介绍了ANTLR 解析字符串(保留空格)并解析普通标识符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 ANTLR4 来解析源文件.我需要做的一件事是,字符串文字包含各种字符和可能的空格,而普通标识符仅包含英文字符和数字(空格被丢弃).

I am trying to use ANTLR4 to parse source files. One thing I need to do is that a string literal contains all kinds of characters and possibly white spaces while normal identifiers contains only English characters and digits (white spaces are thrown away).

我使用了以下 antlr 语法规则(最小示例),但它没有按预期工作.

I use the following antlr grammar rules (the minimal example), but it doesn't work as expected.

grammar parseString;

rules
    :   stringRule+
    ;

stringRule
    :   formatString
    |   idString
;

formatString
    :   STRING_DOUBLEQUOTE    STRING  STRING_DOUBLEQUOTE
    ;

idString
    :   (NONTERM | TERM)
    ;

// LEXER

STRING_DOUBLEQUOTE
    :   '"' ;

DIGITS
    :   DIGIT+
    ;

TERM
    :   UPPERCHAR CHAR+
    ;

NONTERM
    :   LOWERCHAR CHAR+
    ;

fragment
CHAR
    :   LOWERCHAR
    |   UPPERCHAR
    |   DIGIT
    |   '-'
    |   '_'
    ;

fragment
DIGIT
    :   [0-9]
    ;

fragment
LOWERCHAR
    :   [a-z]
    ;

fragment
UPPERCHAR
    :   [A-Z]
    ;

WS 
    :   (' ' | '\t' | '\r' | '\n')+ -> skip 
    ; // skip spaces, tabs, newlines

LINE_COMMENT
    :   '//' ~[\r\n]* -> skip
    ;

STRING
    :   ~('"')*
    ;

对于我使用的测试用例,

For the test cases that I use,

Test
HelloWorld
"$this is a string"
"*this is another string!"

我收到错误 line 1:0 extraneous input 'Test\nHelloWorld\n' expected {'"', TERM, NONTERM}.'formatString' 的最后两行是正确的解析.但对于前两行,由于换行符 ('\n') 没有被丢弃,因此它们与 'idString' 不匹配.我想知道我做错了什么.

I got the error line 1:0 extraneous input 'Test\nHelloWorld\n' expecting {'"', TERM, NONTERM}. And the last two lines of the 'formatString' are correctly parsed. But for the first two lines, since the newline characters ('\n') haven't got thrown away, thus they are not matched to 'idString'. I am wondering what I did wrong.

推荐答案

是的,这个语法有问题.令牌 STRING 匹配Test\nHelloWorld\n".它将把所有东西都放在这个令牌中,但没有规则只需要 TOKEN STRING.

Yes there is a problem in this grammar. the token STRING matchs 'Test\nHelloWorld\n'. It will put everything in this token, but there is no rule that takes just the TOKEN STRING.

考虑更改令牌 STRING.

Think about changing the token STRING.

这篇关于ANTLR 解析字符串(保留空格)并解析普通标识符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆