使用 ANTLR4 识别单行中的多行注释 [英] Recognize multiple line comments within a single line with ANTLR4

查看：38 发布时间：2021/11/11 3:48:13 parsing antlr language-design antlr4 lexer

本文介绍了使用 ANTLR4 识别单行中的多行注释的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想用 ANTLR4 解析 PostScript 代码.我完成了语法，但是一个特定的语言扩展(由其他人引入)导致难以重新调整.

一个简短的例子:

1: % 这是一行注释2: % 下一行只是将值 10 压入堆栈3:104:5: %?description 这是有问题的特殊行注释6:/过程{7:/var1 30 def %这只是创建一个变量8:/var2 10 def %?description 与 var2 %?default 20 相关的描述9:/var3 (a string value) def %?description 我什至被允许使用 % 符号 %?default (另一个值)10:}

可以使用 Lexer-Rules 识别行注释，例如第 1、2 和 7 行中的注释

LINE_COMMENT: '%' .*?新队;新行:'\r'?'\n';

它只是匹配 % 之后的所有内容，直到行尾.

我遇到的问题是那些以%?description 或%?default 开头的特殊行注释，因为它们也应该被识别，但与 LINE_COMMENT 不同的是，我们可以将其中的多个放在一行中(例如第 8 行和第 9 行).所以第 8 行包含两个特殊注释 %?description 与 var2 和 %?default 20 相关的描述.

把它想象成这样(虽然这行不通):

SPECIAL_COMMENT: '%?'.*?(SPECIAL_COMMENT|NEWLINE);

现在是非常棘手的部分:您应该被允许在 %?description 之后放置任意文本，包括 %，同时仍然能够拆分单个评论.>

简而言之，问题可以简化为拆分表格中的一行

(%? <内容中允许有 %>)+ NEWLINE

例如

%?description descr.% in in %?default(我的默认值为 100%)%?rest 更多

进入

1.) %?description descr.与 % in2.) %?default(我的默认值为 100%)3.) %?休息更多

任何想法，如何制定词法分析器或解析器规则来实现这一目标?

解决方案

鉴于这些规则，我认为您必须在词法分析器中使用谓词来检查输入流中是否出现 %?.您还必须确保普通注释必须以 % 开头，但后面不能跟 ?(或换行符).

给定语法:

语法T;@lexer::members {布尔前(字符串文本){for (int i = 0; i < text.length(); i++) {if (text.charAt(i) != _input.LA(i + 1)) {返回假；}}返回真；}}解析: 令牌* EOF;令牌: t=SPECIAL_COMMENT {System.out.println("特殊:" + $t.getText());}|t=COMMENT {System.out.println("正常:" + $t.getText());};特别评论:'%?'( {!ahead("%?")}? ~[\r\n] )*;评论: '%' ( ~[?\r\n] ~[\r\n]* )?;空间: [ \t\r\n]+ ->跳过;

可以进行如下测试:

String source = "% 普通注释\n" +"%?description 我什至可以使用 % 符号 %?default (另一个值)\n" +"% 另一个正常的注释(没有换行符！)";TLexer 词法分析器 = new TLexer(new ANTLRInputStream(source));TParser parser = new TParser(new CommonTokenStream(lexer));parser.parse();

并将打印以下内容:

normal : % 正常注释特别 : %?description 我什至可以使用 % 符号特殊 : %?default (另一个值)正常 : % 另一个正常评论(没有换行符！)

( {!ahead("%?")}? ~[\r\n] )* 部分可以这样读:如果没有%?"前面，匹配除 \r 和 \n 之外的任何字符，并执行零次或多次.

I want to parse PostScript code with ANTLR4. I finished with the grammar, but one particular language extension (which was introduced by someone else) makes trouble being reconized.

A short example:

1: % This is a line comment
2: % The next line just pushes the value 10 onto the stack
3: 10
4: 
5: %?description This is the special line-comment in question
6: /procedure {
7:   /var1 30 def %This just creates a variable
8:   /var2 10 def %?description A description associated with var2 %?default 20
9:   /var3 (a string value) def %?description I am even allowed to use % signs %?default (another value)
10: }

Recognizing line-comments, such as in line 1, 2 and 7 can be done with the Lexer-Rules

LINE_COMMENT: '%' .*? NEWLINE;
NEWLINE: '\r'? '\n';

which simply match everything after a % until the end of the line.

The problem I have is with those special line-comments, that start with something like %?description or %?default, because those should be recognized as well, but in contrast to LINE_COMMENT, one can put multiple of those in a single line (such as in lines 8 and 9). So line 8 contains two special comments %?description A description associated with var2 and %?default 20.

Think of it as something like this (although this won't work):

SPECIAL_COMMENT: '%?' .*? (SPECIAL_COMMENT|NEWLINE);

Now comes the really tricky part: You should be allowed to put arbitrary text after %?description including % while still being able to split the individual comments.

So in short, the issue can be reduced to splitting a line of the form

(%?<keyword> <content with % allowed in it>)+ NEWLINE

e.g.

%?description descr. with % in in %?default (my default value for 100%) %?rest more

into

1.) %?description descr. with % in in 
2.) %?default (my default value for 100%)
3.) %?rest more

Any ideas, how to formulate Lexer or Parser-rules to achieve this?

解决方案

Given those rules, I think you'll have to use a predicate in the lexer to check the input stream for occurrences of %?. You'll also have to make sure a normal comment must start with a %, but not followed by a ? (or line break char).

Given the grammar:

grammar T;

@lexer::members {
  boolean ahead(String text) {
    for (int i = 0; i < text.length(); i++) {
      if (text.charAt(i) != _input.LA(i + 1)) {
        return false;
      }
    }
    return true;
  }
}

parse
 : token* EOF
 ;

token
 : t=SPECIAL_COMMENT {System.out.println("special : " + $t.getText());}
 | t=COMMENT         {System.out.println("normal  : " + $t.getText());}
 ;

SPECIAL_COMMENT
 : '%?' ( {!ahead("%?")}? ~[\r\n] )*
 ;

COMMENT
 : '%' ( ~[?\r\n] ~[\r\n]* )?
 ;

SPACES
 : [ \t\r\n]+ -> skip
 ;

which can be tested as follows:

String source = "% normal comment\n" +
    "%?description I am even allowed to use % signs %?default (another value)\n" +
    "% another normal comment (without a line break!)";
TLexer lexer = new TLexer(new ANTLRInputStream(source));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();

and will print the following:

normal  : % normal comment
special : %?description I am even allowed to use % signs 
special : %?default (another value)
normal  : % another normal comment (without a line break!)

The part ( {!ahead("%?")}? ~[\r\n] )* can be read as follows: if there's no "%?" ahead, match any char other than \r and \n, and do this zero or more times.

这篇关于使用 ANTLR4 识别单行中的多行注释的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 ANTLR4 识别单行中的多行注释 [英] Recognize multiple line comments within a single line with ANTLR4

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 ANTLR4 识别单行中的多行注释 [英] Recognize multiple line comments within a single line with ANTLR4

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭