ANTLR4 只跳过空行 [英] ANTLR4 skips empty line only

查看:16
本文介绍了ANTLR4 只跳过空行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 antlr4 解析文本文件,而且我是新手.这是文件的一部分:

I am using antlr4 parsing a text file and I am new to it. Here is the part of the file:

abcdef
//emptyline
abcdef

在文件流字符串中,它将如下所示:

In file stream string it will be looked like this:

abcdef\r\n\r\nabcdef\r\n

就ANTLR4而言,它提供了跳过"方法,可以在解析时通过正则表达式跳过空格、制表符和换行符等内容.即

In terms of ANTLR4, it offers the "skip" method to skip something like white-space, TAB, and new line symbol by regular expression while parsing. i.e.

WS : [\t\s\r\n]+ -> skip ; // skip spaces, tabs, newlines

我的问题是我只想跳过空行.我不想跳过每一个\r\n".因此,这意味着当有两个或多个\r\n"一起出现时,我只想跳过第二个或后面的.正则表达式应该怎么写?谢谢.

My problem is that I want to skip the empty line only. I don't want to skip every single "\r\n". Therefore it means when there are two or more "\r\n" appear together, I only want to skip the second one or following ones. How should I write the regular expression? Thank you.

grammar INIGrammar_1;
init: (section|NEWLINE)+ ;

section:  '[' phase_name ':' v ']' (contents)+ 
            | '[' phase_name ']' (contents)+ ; 
//
//
phase_name : STRING
            |MTT
            |MPI_GET
            |MPI_INSTALL
            |MPI_DETAILS
            |TEST_GET
            |TEST_BUILD
            |TEST_RUN
            |REPORTER
            ; 
v  : STRING ;      

contents: kvpairs 
          | include_section_pairs
          | if_statement
          | NEWLINE
          | EOT
          ;

keylhs : STRING
        ;
valuerhs : STRING 
          |multiline_valuerhs
          |kvpairs
          |url
          ;
kvpairs: keylhs '=' valuerhs NEWLINE
        ;
include_section_pairs: INCLUDE_SECTION '=' STRING
                    ;
if_statement: IF if_statement_condition THEN NEWLINE (ELSEIF if_statement_condition THEN NEWLINE)*? STRING NEWLINE IFEND NEWLINE
            ;
if_statement_condition:STRING '=' STRING ';'//here, semicolon has problem, either I use ';' or SEMICOLON
                        ;
multiline_valuerhs:STRING (',' (' ')*? ( '\\' (' ')*? NEWLINE)? STRING)+ 
                    ;
url:(' ')*?'http'':''//''www.';//ignore this, not finished.
IF: 'if';
ELSEIF:'elif';
IFEND:'fi';
THEN: 'then';
SEMICOLON: ';';
STRING : [a-z|A-Z|0-9|''| |.|\-|_|(|)|#|&|""|/|@|<|>|$]+ ;

//Keywords
MTT: 'MTT';
MPI_GET: 'MPI get';
MPI_INSTALL:'MPI install';
MPI_DETAILS:'MPI Details';
TEST_GET:'Test get';
TEST_BUILD: 'Test build';
TEST_RUN: 'Test run';
REPORTER: 'Reporter';
INCLUDE_SECTION: 'include_section';
//INCLUDE_SECTION_VALUE:STRING;
EOT:'EOT';

NEWLINE: ('\r' ? '\n')+ ;
WS : [\t]+ -> skip ; // skip spaces, tabs, newlines
COMMENT: '#' .*? '\r'?'\n' -> skip;
EMPTYLINE: '\r\n' -> skip;

INI 文件的一部分

#======================================================================
# MPI run details
#======================================================================

[MPI Details: Open MPI]

# MPI tests
#exec = mpirun @hosts@ -np &test_np() @mca@ --prefix &test_prefix() &test_executable() &test_argv()
exec = mpirun @hosts@ -np &test_np() --prefix &test_prefix() &test_executable() &test_argv()

hosts = &if(&have_hostfile(), "--hostfile " . &hostfile(), \
            &if(&have_hostlist(), "--host " . &hostlist(), ""))

还有一件小事,它看起来像;"不能在结果中表示为它自己.ANTLR4 只是一直说它需要其他东西,并将分号视为未知符号.

One more small thing is, it seems like ";" cannot be indicated as itself in result. The ANTLR4 just keep saying it expects something else and treat the semicolon as unknown symbol.

推荐答案

对您问题的简短回答是,空格对您的解析器而言并不重要,因此在词法分析器中将其全部跳过.

The short answer to your question is that whitespace is not significant to your parser, so skip it all in the lexer.

更长的答案是认识到跳过空格(或任何其他字符序列)并不意味着它在词法分析器中不重要.这意味着解析器不会产生相应的令牌以供使用.因此,跳过的空格仍将作为生成标记的分隔符.

The longer answer is to recognize that skipping whitespace (or any other character sequence) does not mean that it is not significant in the lexer. All it means is that no corresponding token is produced for consumption by the parser. Skipped whitespace will therefore still operate as a delimiter for generated tokens.

几个额外的观察:

  1. Antlr 不做正则表达式 - 沿着这些思路思考会导致进一步的概念困难.

  1. Antlr does not do regex's - thinking along those lines will lead to further conceptual difficulties.

不要忽略在词法分析器/解析器生成过程中产生的警告和错误消息 - 在生成的代码正确运行之前,它们几乎总是需要更正.

Don't ignore warnings and errors messages produced in the generation of the Lexer/Parser - they almost always require correction before the generated code will function correctly.

在尝试调试解析器规则之前,确实有助于验证词法分析器是否正在生成您想要的令牌流.请参阅此答案显示如何转储令牌流.

Really helps to verify that the lexer is producing your intended token stream before trying to debug parser rules. See this answer that shows how to dump the token stream.

这篇关于ANTLR4 只跳过空行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆