ANTLR解析字符串(保留空格)并解析普通标识符 [英] ANTLR parse strings (keep whitespaces) and parse normal identifiers

查看:148
本文介绍了ANTLR解析字符串(保留空格)并解析普通标识符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用ANTLR4解析源文件.我需要做的一件事是,字符串文字包含各种字符,可能还包含空格,而普通标识符仅包含英文字符和数字(丢弃空白).

I am trying to use ANTLR4 to parse source files. One thing I need to do is that a string literal contains all kinds of characters and possibly white spaces while normal identifiers contains only English characters and digits (white spaces are thrown away).

我使用以下antlr语法规则(最小示例),但是它不能按预期工作.

I use the following antlr grammar rules (the minimal example), but it doesn't work as expected.

grammar parseString;

rules
    :   stringRule+
    ;

stringRule
    :   formatString
    |   idString
;

formatString
    :   STRING_DOUBLEQUOTE    STRING  STRING_DOUBLEQUOTE
    ;

idString
    :   (NONTERM | TERM)
    ;

// LEXER

STRING_DOUBLEQUOTE
    :   '"' ;

DIGITS
    :   DIGIT+
    ;

TERM
    :   UPPERCHAR CHAR+
    ;

NONTERM
    :   LOWERCHAR CHAR+
    ;

fragment
CHAR
    :   LOWERCHAR
    |   UPPERCHAR
    |   DIGIT
    |   '-'
    |   '_'
    ;

fragment
DIGIT
    :   [0-9]
    ;

fragment
LOWERCHAR
    :   [a-z]
    ;

fragment
UPPERCHAR
    :   [A-Z]
    ;

WS 
    :   (' ' | '\t' | '\r' | '\n')+ -> skip 
    ; // skip spaces, tabs, newlines

LINE_COMMENT
    :   '//' ~[\r\n]* -> skip
    ;

STRING
    :   ~('"')*
    ;

对于我使用的测试用例,

For the test cases that I use,

Test
HelloWorld
"$this is a string"
"*this is another string!"

我收到错误line 1:0 extraneous input 'Test\nHelloWorld\n' expecting {'"', TERM, NONTERM}.并且正确解析了"formatString"的最后两行.但是对于前两行,由于尚未删除换行符('\ n'),因此它们与'idString'不匹配.我想知道我做错了什么.

I got the error line 1:0 extraneous input 'Test\nHelloWorld\n' expecting {'"', TERM, NONTERM}. And the last two lines of the 'formatString' are correctly parsed. But for the first two lines, since the newline characters ('\n') haven't got thrown away, thus they are not matched to 'idString'. I am wondering what I did wrong.

推荐答案

是的,该语法有问题.令牌STRING与"Test \ nHelloWorld \ n"匹配.它将所有内容都放入此令牌中,但是没有规则仅采用令牌令牌.

Yes there is a problem in this grammar. the token STRING matchs 'Test\nHelloWorld\n'. It will put everything in this token, but there is no rule that takes just the TOKEN STRING.

考虑更改令牌STRING.

Think about changing the token STRING.

这篇关于ANTLR解析字符串(保留空格)并解析普通标识符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆