解析注释行 [英] Parse comment line

查看：26 发布时间：2021/11/11 3:36:57 line antlr grammar comments

本文介绍了解析注释行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

鉴于以下基本语法，我想了解如何处理注释行.缺少的是 <CR><LF> 的处理，它通常会终止注释行——唯一的例外是 EOF 之前的最后一个注释行，例如.:

# 注释abcd := 12 ;# 没有<CR><LF>的评论eof

语法 CommentLine1a;//==========================================================//选项//==========================================================//==========================================================//词法规则//==========================================================整数: 数字+;片段数字:'0'..'9';ID_NoDigitStart: ( 'a'..'z' | 'A'..'Z' ) ('a'..'z' | 'A'..'Z' | 数字)*;空白: ( ' ' | '\t' | '\r' | '\n' )+ { $channel = HIDDEN ;};//==========================================================//解析器规则//==========================================================代码:( 作业 | 评论 )+;任务: id_NoDigitStart ':=' id_DigitStart ';';id_NoDigitStart: ID_NoDigitStart;id_DigitStart:( ID_NoDigitStart | Int )+;评论: '#' ~( '\r' | '\n' )*;

解决方案

除非您有非常令人信服的理由将评论放在解析器中(我很想听听)，否则您应该将它放在词法分析器中:

评论: '#' ~( '\r' | '\n' )*;

并且由于您已经在 Space 规则中考虑了换行符，因此输入像 # comment eof 没有 <CR><LF> >

此外，如果您在解析器规则中使用文字标记，ANTLR 会在幕后自动创建它们的词法分析器规则.所以在你的情况下:

评论: '#' ~( '\r' | '\n' )*;

将匹配一个 '#' 后跟零个或多个标记，而不是 '\r' 和 '\n' 和 not 零个或多个除 '\r' 和 '\n' 之外的字符.

供以后参考:

内部解析器规则

~ 否定标记
. 匹配任何标记

词法分析器内部规则

~ 否定字符
. 匹配 0x0000 ... 0xFFFF

Given the following basic grammar I want to understand how I can handle comment lines. Missing is the handling of the <CR><LF> which usually terminates the comment line - the only exception is a last comment line before the EOF, e. g.:

# comment
abcd := 12 ;
# comment eof without <CR><LF>

grammar CommentLine1a;

//==========================================================
// Options
//==========================================================



//==========================================================
// Lexer Rules
//==========================================================

Int
  : Digit+
  ;

fragment Digit
  : '0'..'9'
  ;

ID_NoDigitStart
  : ( 'a'..'z' | 'A'..'Z' ) ('a'..'z' | 'A'..'Z' | Digit )*
  ;

Whitespace
  : ( ' ' | '\t' | '\r' | '\n' )+ { $channel = HIDDEN ; }
  ; 


//==========================================================
// Parser Rules
//==========================================================

code
  : ( assignment | comment )+
  ;

assignment
  : id_NoDigitStart ':=' id_DigitStart ';'
  ;

id_NoDigitStart
  : ID_NoDigitStart
  ;  

id_DigitStart
  : ( ID_NoDigitStart | Int )+
  ;

comment
  : '#' ~( '\r' | '\n' )*
  ;

解决方案

Unless you have a very compelling reason to put the comment inside the parser (which I'd like to hear), you should put it in the lexer:

Comment
  :  '#' ~( '\r' | '\n' )*
  ;

And since you already account for line breaks in your Space rule, there's no problem with input like # comment eof without <CR><LF>

Also, if you use literal tokens inside parser rules, ANTLR automatically creates lexer rules of them behind the scenes. So in your case:

comment
  :  '#' ~( '\r' | '\n' )*
  ;

would match a '#' followed by zero or more tokens other than '\r' and '\n' and not zero or more characters other than '\r' and '\n'.

For future reference:

Inside parser rules

~ negates tokens
. matches any token

Inside lexer rules

~ negates characters
. matches any character in the range 0x0000 ... 0xFFFF

这篇关于解析注释行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

解析注释行 [英] Parse comment line

问题描述

内部解析器规则

词法分析器内部规则

Inside parser rules

Inside lexer rules

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

解析注释行 [英] Parse comment line

问题描述

内部解析器规则

词法分析器内部规则

Inside parser rules

Inside lexer rules

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭