使用ANTLR4识别一行中的多行注释 [英] Recognize multiple line comments within a single line with ANTLR4
问题描述
我想用ANTLR4解析PostScript代码.我完成了语法的学习,但是一种特殊的语言扩展(由其他人引入)使重新协调变得很麻烦.
I want to parse PostScript code with ANTLR4. I finished with the grammar, but one particular language extension (which was introduced by someone else) makes trouble being reconized.
一个简短的例子:
1: % This is a line comment
2: % The next line just pushes the value 10 onto the stack
3: 10
4:
5: %?description This is the special line-comment in question
6: /procedure {
7: /var1 30 def %This just creates a variable
8: /var2 10 def %?description A description associated with var2 %?default 20
9: /var3 (a string value) def %?description I am even allowed to use % signs %?default (another value)
10: }
使用Lexer-Rules可以识别诸如第1、2和7行中的行注释
Recognizing line-comments, such as in line 1, 2 and 7 can be done with the Lexer-Rules
LINE_COMMENT: '%' .*? NEWLINE;
NEWLINE: '\r'? '\n';
只需匹配%之后的所有内容,直到该行的末尾.
which simply match everything after a % until the end of the line.
我遇到的问题是那些以%?description
或%?default
之类开头的特殊行注释,因为这些行注释也应该被识别,但是与LINE_COMMENT相比,可以将多个行注释放在一个单行(例如第8和9行).因此,第8行包含两个特殊注释%?description A description associated with var2
和%?default 20
.
The problem I have is with those special line-comments, that start with something like %?description
or %?default
, because those should be recognized as well, but in contrast to LINE_COMMENT, one can put multiple of those in a single line (such as in lines 8 and 9). So line 8 contains two special comments %?description A description associated with var2
and %?default 20
.
可以将其视为这样(尽管这将无法正常工作):
Think of it as something like this (although this won't work):
SPECIAL_COMMENT: '%?' .*? (SPECIAL_COMMENT|NEWLINE);
现在是真正棘手的部分:应该允许您在%?description
之后加上任意文本,包括%
,同时仍然可以拆分单个注释.
Now comes the really tricky part: You should be allowed to put arbitrary text after %?description
including %
while still being able to split the individual comments.
因此,简而言之,可以将问题简化为分割表格的一行
So in short, the issue can be reduced to splitting a line of the form
(%?<keyword> <content with % allowed in it>)+ NEWLINE
例如
%?description descr. with % in in %?default (my default value for 100%) %?rest more
进入
1.) %?description descr. with % in in
2.) %?default (my default value for 100%)
3.) %?rest more
任何想法,如何制定Lexer或Parser规则来实现这一目标?
Any ideas, how to formulate Lexer or Parser-rules to achieve this?
推荐答案
鉴于这些规则,我认为您必须在词法分析器中使用谓词来检查输入流中是否出现%?
.您还必须确保正常注释必须以%
开头,而不能以?
(或换行符)开头.
Given those rules, I think you'll have to use a predicate in the lexer to check the input stream for occurrences of %?
. You'll also have to make sure a normal comment must start with a %
, but not followed by a ?
(or line break char).
给出语法:
grammar T;
@lexer::members {
boolean ahead(String text) {
for (int i = 0; i < text.length(); i++) {
if (text.charAt(i) != _input.LA(i + 1)) {
return false;
}
}
return true;
}
}
parse
: token* EOF
;
token
: t=SPECIAL_COMMENT {System.out.println("special : " + $t.getText());}
| t=COMMENT {System.out.println("normal : " + $t.getText());}
;
SPECIAL_COMMENT
: '%?' ( {!ahead("%?")}? ~[\r\n] )*
;
COMMENT
: '%' ( ~[?\r\n] ~[\r\n]* )?
;
SPACES
: [ \t\r\n]+ -> skip
;
可以进行以下测试:
String source = "% normal comment\n" +
"%?description I am even allowed to use % signs %?default (another value)\n" +
"% another normal comment (without a line break!)";
TLexer lexer = new TLexer(new ANTLRInputStream(source));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
并打印以下内容:
normal : % normal comment
special : %?description I am even allowed to use % signs
special : %?default (another value)
normal : % another normal comment (without a line break!)
部分( {!ahead("%?")}? ~[\r\n] )*
可以如下读取:如果没有%?"前面,匹配除\r
和\n
之外的任何其他字符,并进行零次或多次.
The part ( {!ahead("%?")}? ~[\r\n] )*
can be read as follows: if there's no "%?" ahead, match any char other than \r
and \n
, and do this zero or more times.
这篇关于使用ANTLR4识别一行中的多行注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!