ANTLR4仅跳过空行 [英] ANTLR4 skips empty line only

查看:151
本文介绍了ANTLR4仅跳过空行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用antlr4解析文本文件,并且是我的新手.这是文件的一部分:

I am using antlr4 parsing a text file and I am new to it. Here is the part of the file:

abcdef
//emptyline
abcdef

在文件流字符串中,它将如下所示:

In file stream string it will be looked like this:

abcdef\r\n\r\nabcdef\r\n

就ANTLR4而言,它提供了跳过"方法来在解析时通过正则表达式跳过诸如空格,TAB和换行符之类的内容.即

In terms of ANTLR4, it offers the "skip" method to skip something like white-space, TAB, and new line symbol by regular expression while parsing. i.e.

WS : [\t\s\r\n]+ -> skip ; // skip spaces, tabs, newlines

我的问题是我只想跳过空行.我不想跳过每一个"\ r \ n".因此,这意味着当同时出现两个或多个"\ r \ n"时,我只想跳过第二个或后面的一个.我应该如何编写正则表达式?谢谢.

My problem is that I want to skip the empty line only. I don't want to skip every single "\r\n". Therefore it means when there are two or more "\r\n" appear together, I only want to skip the second one or following ones. How should I write the regular expression? Thank you.

grammar INIGrammar_1;
init: (section|NEWLINE)+ ;

section:  '[' phase_name ':' v ']' (contents)+ 
            | '[' phase_name ']' (contents)+ ; 
//
//
phase_name : STRING
            |MTT
            |MPI_GET
            |MPI_INSTALL
            |MPI_DETAILS
            |TEST_GET
            |TEST_BUILD
            |TEST_RUN
            |REPORTER
            ; 
v  : STRING ;      

contents: kvpairs 
          | include_section_pairs
          | if_statement
          | NEWLINE
          | EOT
          ;

keylhs : STRING
        ;
valuerhs : STRING 
          |multiline_valuerhs
          |kvpairs
          |url
          ;
kvpairs: keylhs '=' valuerhs NEWLINE
        ;
include_section_pairs: INCLUDE_SECTION '=' STRING
                    ;
if_statement: IF if_statement_condition THEN NEWLINE (ELSEIF if_statement_condition THEN NEWLINE)*? STRING NEWLINE IFEND NEWLINE
            ;
if_statement_condition:STRING '=' STRING ';'//here, semicolon has problem, either I use ';' or SEMICOLON
                        ;
multiline_valuerhs:STRING (',' (' ')*? ( '\\' (' ')*? NEWLINE)? STRING)+ 
                    ;
url:(' ')*?'http'':''//''www.';//ignore this, not finished.
IF: 'if';
ELSEIF:'elif';
IFEND:'fi';
THEN: 'then';
SEMICOLON: ';';
STRING : [a-z|A-Z|0-9|''| |.|\-|_|(|)|#|&|""|/|@|<|>|$]+ ;

//Keywords
MTT: 'MTT';
MPI_GET: 'MPI get';
MPI_INSTALL:'MPI install';
MPI_DETAILS:'MPI Details';
TEST_GET:'Test get';
TEST_BUILD: 'Test build';
TEST_RUN: 'Test run';
REPORTER: 'Reporter';
INCLUDE_SECTION: 'include_section';
//INCLUDE_SECTION_VALUE:STRING;
EOT:'EOT';

NEWLINE: ('\r' ? '\n')+ ;
WS : [\t]+ -> skip ; // skip spaces, tabs, newlines
COMMENT: '#' .*? '\r'?'\n' -> skip;
EMPTYLINE: '\r\n' -> skip;

INI文件的一部分

#======================================================================
# MPI run details
#======================================================================

[MPI Details: Open MPI]

# MPI tests
#exec = mpirun @hosts@ -np &test_np() @mca@ --prefix &test_prefix() &test_executable() &test_argv()
exec = mpirun @hosts@ -np &test_np() --prefix &test_prefix() &test_executable() &test_argv()

hosts = &if(&have_hostfile(), "--hostfile " . &hostfile(), \
            &if(&have_hostlist(), "--host " . &hostlist(), ""))

还有一个小问题,它看起来像是;"不能表示为结果本身.ANTLR4一直在说它还需要其他东西,并将分号视为未知符号.

One more small thing is, it seems like ";" cannot be indicated as itself in result. The ANTLR4 just keep saying it expects something else and treat the semicolon as unknown symbol.

推荐答案

这个问题的简短答案是空格对于解析器而言并不重要,因此请在词法分析器中将其全部跳过.

The short answer to your question is that whitespace is not significant to your parser, so skip it all in the lexer.

更长的答案是要认识到跳过空格(或任何其他字符序列)并不意味着它在词法分析器中不重要.这意味着没有解析器产生任何相应的令牌供消费.因此,跳过的空格仍将用作生成令牌的定界符.

The longer answer is to recognize that skipping whitespace (or any other character sequence) does not mean that it is not significant in the lexer. All it means is that no corresponding token is produced for consumption by the parser. Skipped whitespace will therefore still operate as a delimiter for generated tokens.

其他观察结果:

  1. Antlr不执行正则表达式-按照这些思路进行思考将导致进一步的概念上的困难.

  1. Antlr does not do regex's - thinking along those lines will lead to further conceptual difficulties.

不要忽略在生成Lexer/Parser时产生的警告和错误消息-它们几乎总是需要更正,然后才能正确生成所生成的代码.

Don't ignore warnings and errors messages produced in the generation of the Lexer/Parser - they almost always require correction before the generated code will function correctly.

在尝试调试解析器规则之前,确实可以帮助验证词法分析器正在生成您想要的令牌流.请参阅显示的答案如何转储令牌流.

Really helps to verify that the lexer is producing your intended token stream before trying to debug parser rules. See this answer that shows how to dump the token stream.

这篇关于ANTLR4仅跳过空行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆