Antlr - 为 C.g4 解析多行 #define [英] Antlr - Parsing Multiline #define for C.g4

查看:20
本文介绍了Antlr - 为 C.g4 解析多行 #define的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Antlr4 来解析 C 代码.我想解析多行 #defines 以及提供的 C.g4C.g4

I am using Antlr4 to parse C code. I want to parse multiline #defines alongwith C.g4 provided in C.g4

但是上面链接中提到的语法不支持预处理器指令,所以我添加了以下新规则来支持预处理.

But the grammar mentioned in the link above does not support preprocessor directives, so I have added the following new rules to support preprocessing.

链接到我之前的问题

Whitespace
    :   [ \t]+
        -> channel(HIDDEN)
    ;

Newline
    :   (   '\r' '\n'?
        |   '\n'
        )
        -> channel(HIDDEN)
    ;

BlockComment
    :   '/*' .*? '*/'
    ;

LineComment
    :   '//' ~[\r\n]*
    ;


IncludeBlock
     :   '#' Whitespace? 'include' ~[\r\n]*
     ;

DefineStart
    :     '#' Whitespace? 'define'
    ;

DefineBlock
     :   DefineStart ~[\r\n]*
     ;

    MultiDefine
    :   DefineStart MultiDefineBody
    ;

MultiDefineBody
    :   [\\] [\r\n]+ MultiDefineBody
    |   ~[\r\n]
    ;



preprocessorDeclaration
    :   includeDeclaration
    |   defineDeclaration
    ;

includeDeclaration
    :   IncludeBlock
    ;

defineDeclaration
    :   DefineBlock | MultiDefine
    ;

comment
    :   BlockComment
    |   LineComment
    ;

declaration
    :   declarationSpecifiers initDeclaratorList ';'
    |   declarationSpecifiers ';'
    |   staticAssertDeclaration
    |   preprocessorDeclaration
    |   comment
    ;

如果删除了 MultiBlock 规则,它仅适用于单行预处理器指令但是对于多行 #defines 它不起作用.

It works only for Single line pre-processor directives if MultiBlock rule is removed But for multiline #defines it is not working.

任何帮助将不胜感激

多行#define 我的意思是

By Multiline #define I mean

#define MACRO(num, str) {\
            printf("%d", num);\
            printf(" is");\
            printf(" %s number", str);\
            printf("\n");\
           }

基本上我需要找到一个可以解析上述块的语法

Basically I need to find a grammar that can parse the above block

推荐答案

我无耻地从 此处:

这是因为 ANTLR 的词法分析器匹配先到先得".那意味着它将托盘匹配给定的输入与第一个指定的(在源代码中)规则,如果那个可以匹配输入,它不会尝试将其与其他匹配.

This is because ANTLR's lexer matches "first come, first serve". That means it will tray to match the given input with the first specified (in the source code) rule and if that one can match the input, it won't try to match it with the other ones.

在您的情况下,输入序列 DefineStart \\\r\n(其中 DefineStart 代表对应于相应规则的输入序列)将由 DefineBlock 因为 \\ 正在被 ~[\r\n]* 构造消耗.

In your case the input sequence DefineStart \\\r\n (where DefineStart stands for an input-sequence corresponsing to the respective rule) will be matched by DefineBlock because the \\ is being consumed by the ~[\r\n]* construct.

您现在有两种可能性:要么调整当前的规则集以规避此问题,要么(我的建议)您只需使用一个规则来匹配定义语句(单行和多行).

You now have two possibilities: Either you tweak your current set of rules in order to circumvent this problem or (my sugestion) you simply use one rule for matching a define-statement (single and multiline).

这样的合并规则可能如下所示:

Such a merged rule could look like this:

DefineBlock:
    DefineStart (~[\\\r\n] | '\\\\' '\r'? '\n' | '\\'. )*
;

请注意,此代码未经测试,但应如下所示:匹配 DefineStart,然后匹配与以下模式匹配的任意长字符序列:当前字符要么不是 \\r\n,它是转义的换行符或反斜杠后跟任意字符.这应该允许所需的换行符转义.

Note that this code is untested but it should read like this: Match DefineStart and afterwards an arbitrary long character sequence matching the following pattern: The current character is either not \, \r or \n, it is an escaped newline or a backslash followed by an arbitrary character. This should allow for the wished newline-escaping.

这篇关于Antlr - 为 C.g4 解析多行 #define的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆