如何检查一行的第一个字符是否为“*"?在 ANTLR4 中? [英] How can I check if first character of a line is "*" in ANTLR4?

查看:17
本文介绍了如何检查一行的第一个字符是否为“*"?在 ANTLR4 中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为一种相对简单但特殊的语言编写解析器.

I am trying to write a parser for a relatively simple but idiosyncratic language.

简单地说,其中一条规则是注释行用星号表示,如果该星号是行的第一个字符.我该如何在 ANTLR4 中正式化这样的规则?我想过使用:

Simply put, one of the rules is that comment lines are denoted by an asterisk only if that asterisk is the first character of the line. How might I go about formalising such a rule in ANTLR4? I thought about using:

START_LINE_COMMENT: '\n*' .*? '\n' -> skip; 

但我确信这不会连续处理多于一行的注释,因为末尾的换行符将作为 START_LINE_COMMENT 标记的一部分使用,这意味着任何后续的注释行将丢失所需的初始换行符,这将不起作用.有没有一种方法可以检查该行是否以 '*' 开头而无需使用先前的 '\n'?

But I am certain this won't work with more than one line comment in a row, as the newline at the end will be consumed as part of the START_LINE_COMMENTtoken, meaning any subsequent comment lines will be missing the required initial newline character, which won't work. Is there a way I can perhaps check if the line starts with a '*' without needing to consume the prior '\n'?

推荐答案

匹配注释行并不容易.当我每年写一个语法时,我不得不抓住 The Definitive ANTLR参考来刷新我的大脑.试试这个:

Matching a comment line is not easy. As I write one grammar per year, I had to grab to The Definitive ANTLR Reference to refresh my brain. Try this :

grammar Question;

/* Comment line having an * in column 1. */

question
    :   line+
    ;

line
//    :   ( ID | INT )+
    :   ( ID | INT | MULT )+
    ;

LINE_COMMENT
    :   '*' {getCharPositionInLine() == 1}? ~[\r\n]* -> channel(HIDDEN) ;
ID  :   [a-zA-Z]+ ;
INT :   [0-9]+ ;
//WS  :   [ \t\r\n]+ -> channel(HIDDEN) ;
WS  :   [ \t\r\n]+ -> skip ;
MULT : '*' ;

编译执行:

$ echo $CLASSPATH
.:/usr/local/lib/antlr-4.6-complete.jar:
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Question.g4 
$ javac Q*.java
$ grun Question question -tokens data.txt 
[@0,0:3='line',<ID>,1:0]
[@1,5:5='1',<INT>,1:5]
[@2,9:12='line',<ID>,2:2]
[@3,14:14='2',<INT>,2:7]
[@4,16:26='* comment 1',<LINE_COMMENT>,channel=1,3:0]
[@5,32:35='line',<ID>,4:4]
[@6,37:37='4',<INT>,4:9]
[@7,39:48='*comment 2',<LINE_COMMENT>,channel=1,5:0]
[@8,51:78='* comment 3 after empty line',<LINE_COMMENT>,channel=1,7:0]
[@9,81:81='*',<'*'>,8:1]
[@10,83:85='not',<ID>,8:3]
[@11,87:87='a',<ID>,8:7]
[@12,89:95='comment',<ID>,8:9]
[@13,97:100='line',<ID>,9:0]
[@14,102:102='9',<INT>,9:5]
[@15,107:107='*',<'*'>,9:10]
[@16,109:110='no',<ID>,9:12]
[@17,112:118='comment',<ID>,9:15]
[@18,120:119='<EOF>',<EOF>,10:0]

具有以下 data.text 文件:

with the following data.text file :

line 1
        line 2
* comment 1
    line 4
*comment 2

* comment 3 after empty line
 * not a comment
line 9    * no comment

请注意,如果解析器规则中的某处没有 MULT 标记或 '*',则标记中不会列出星号,但解析器会抱怨:

Note that without the MULT token or '*' somewhere in a parser rule, the asterisk is not listed in the tokens, but the parser complains :

line 8:1 token recognition error at: '*'

如果显示解析树

$ grun Question question -gui data.txt

您会看到整个文件都被一行规则吸收了.如果您需要识别线条,请像这样更改线条和空格规则:

you'll see that the whole file is absorbed by one line rule. If you need to recognize lines, change the line and white space rules like so :

line
    :   ( ID | INT | MULT )+ NL
    |   NL
    ;

//WS  :   [ \t\r\n]+ -> skip ;
NL  :   [\r\n] ;
WS  :   [ \t]+ -> skip ;

这篇关于如何检查一行的第一个字符是否为“*"?在 ANTLR4 中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆