来自带有预处理器指令的 c 代码的 AST [英] AST from c code with preprocessor directives

查看:24
本文介绍了来自带有预处理器指令的 c 代码的 AST的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从 gcc C 代码构建 AST(抽象语法树)以进行一些转换,如下所示,然后再次将代码复制(生成)为 C 语法?

How can I build an AST (Abstract Syntax Tree) from gcc C code in order to make some transformations, like below, and reproduce(generate) the code to C syntax again after that?

    if(condition_1){
     //lines of code 1
    }
    #ifdef expression_1
        else if(condition_2){
           //lines of code 2
        }
   #endif

进入

bool test = condition_1;
if(teste){
 //lines of code 1
}
#ifdef expression_1
  if(!(test) && condition_2){
    //lines of code 2
  }
#endif

推荐答案

GCC 本身将构建 AST,但不会在扩展预处理器指令之前构建.所以预处理器条件消失了.完成转换后重新安装它们将非常困难.进行涉及条件本身的转换将是不可能的.所以 GCC 本身并不是获得你想要的 AST 的好方法.

GCC itself will build ASTs, but not before expanding the preprocessor directives. So the preprocessor conditionals are gone. Reinstalling them after you have done the transformations will be extremely hard. Doing transformations that involved the conditionals themselves will be impossible. So GCC itself is not a good way to get the ASTs you want.

如果您想解析您的代码示例(包裹在 else if 周围的条件非常好!),您需要一个 重新设计 解析器.这些是旨在支持重构的解析器.此类解析器需要捕获比传统解析器更多的信息,例如标记的列数、词汇项的格式等,以从修改后的树中重新生成源文本.对于 C,这样的解析器也必须捕获处理器指令.这些非常罕见.

If you want to parse your code example (the conditional wrapped around the else if is really nice!), you need a reengineering parser. These are parsers designed to support refactoring. Such parsers need to capture more than traditional parsers, e.g., column numbers of tokens, the format of lexical items, etc., to enable the regeneration of source text from the modified tree. For C, such a parser must capture the proprocessor directives, too. These are pretty rare.

我们的 DMS Software Reengineering Toolkit 及其 C 前端就是一个这样的重新设计解析器,它可以处理许多 C 方言,包括 GCC 2/3/4/5.它被明确设计为捕获预处理器条件(包括您的特定示例).DMS 还支持使用源到源转换来执行转换.

One such reengineering parser is our DMS Software Reengineering Toolkit and its C front end, which handles many dialects of C including GCC 2/3/4/5. It is designed explicitly to capture preprocessor conditionals (including your specific example). DMS also has support for carrying out the transformations using source-to-source transformations.

对于 OP 示例的更改为合法版本,放置在 test.c 中:

For a changed-to-make-legal version of OP's example, placed in test.c:

void main () {
  if (condition_1) {
     x++; 
  }
  #ifdef expression_1
  else if (condition_2) {
         y++;
       }
  #endif
}

... DMS C~GCC4 解析器(开箱即用)生成以下 AST:

... the DMS C~GCC4 parser (out of the box) produces the following AST:

C:DMSDomainsCGCC4ToolsParserSource>run ..domainparser ++AST C:	emp	est.c
C~GCC4 Domain Parser Version 3.0.1(28449)
Copyright (C) 1996-2015 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit
AST Optimizations: remove constant tokens, remove unary productions, compact sequences
Using encoding Unicode-UTF-8?ANSI +CRLF +1 /^I

28 tree nodes in tree.
(translation_unit@C~GCC4=2#3cde920^0 Line 1 Column 1 File C:/temp/test.c
 (function_definition@C~GCC4=966#3cde740^1#3cde920:1 Line 1 Column 1 File C:/temp/test.c
  (function_head@C~GCC4=967#3047320^1#3cde740:1 Line 1 Column 1 File C:/temp/test.c
   (simple_type_specifier@C~GCC4=686#3047180^1#3047320:1 Line 1 Column 1 File C:/temp/test.c)simple_type_specifier
   (direct_declarator@C~GCC4=852#3047380^1#3047320:2 Line 1 Column 6 File C:/temp/test.c
   |(IDENTIFIER@C~GCC4=1531#3047160^1#3047380:1[`main'] Line 1 Column 6 File C:/temp/test.c)IDENTIFIER
   |(parameter_declaration_clause@C~GCC4=900#30473c0^1#3047380:2 Line 1 Column 12 File C:/temp/test.c)parameter_declaration_clause
   )direct_declarator#3047380
  )function_head#3047320
  (compound_statement@C~GCC4=507#3cde1e0^1#3cde740:2 Line 1 Column 14 File C:/temp/test.c
   (selection_statement@C~GCC4=539#3cde940^1#3cde1e0:1 Line 2 Column 3 File C:/temp/test.c
   |(if_head@C~GCC4=550#30476e0^1#3cde940:1 Line 2 Column 3 File C:/temp/test.c
   | (IDENTIFIER@C~GCC4=1531#30473e0^1#30476e0:1[`condition_1'] Line 2 Column 7 File C:/temp/test.c)IDENTIFIER
   |)if_head#30476e0
   |(compound_statement@C~GCC4=507#3cde700^1#3cde940:2 Line 2 Column 20 File C:/temp/test.c
   | (expression_statement@C~GCC4=503#3047740^1#3cde700:1 Line 3 Column 6 File C:/temp/test.c
   |  (postfix_expression@C~GCC4=205#3047720^1#3047740:1 Line 3 Column 6 File C:/temp/test.c
   |   (IDENTIFIER@C~GCC4=1531#3047700^1#3047720:1[`x'] Line 3 Column 6 File C:/temp/test.c)IDENTIFIER
   |  )postfix_expression#3047720
   | )expression_statement#3047740
   |)compound_statement#3cde700
   |(if_directive@C~GCC4=1088#3cde7a0^1#3cde940:3 Line 5 Column 3 File C:/temp/test.c
   | ('#'@C~GCC4=1548#3cde820^1#3cde7a0:1[Keyword:0] Line 5 Column 3 File C:/temp/test.c)'#'
   | (IDENTIFIER@C~GCC4=1531#3cde1c0^1#3cde7a0:2[`expression_1'] Line 5 Column 10 File C:/temp/test.c)IDENTIFIER
   | (new_line@C~GCC4=1578#3cde800^1#3cde7a0:3[Keyword:0] Line 5 Column 22 File C:/temp/test.c)new_line
   |)if_directive#3cde7a0
   |(selection_statement@C~GCC4=527#3cde840^1#3cde940:4 Line 6 Column 8 File C:/temp/test.c
   | (IDENTIFIER@C~GCC4=1531#3047340^1#3cde840:1[`condition_2'] Line 6 Column 12 File C:/temp/test.c)IDENTIFIER
   | (compound_statement@C~GCC4=507#3cde860^1#3cde840:2 Line 6 Column 25 File C:/temp/test.c
   |  (expression_statement@C~GCC4=503#3cde8a0^1#3cde860:1 Line 7 Column 12 File C:/temp/test.c
   |   (postfix_expression@C~GCC4=205#3cde880^1#3cde8a0:1 Line 7 Column 12 File C:/temp/test.c
   |   |(IDENTIFIER@C~GCC4=1531#3cde780^1#3cde880:1[`y'] Line 7 Column 12 File C:/temp/test.c)IDENTIFIER
   |   )postfix_expression#3cde880
   |  )expression_statement#3cde8a0
   | )compound_statement#3cde860
   |)selection_statement#3cde840
   |(endif_directive@C~GCC4=1092#3cde8c0^1#3cde940:5 Line 9 Column 3 File C:/temp/test.c
   | ('#'@C~GCC4=1548#3cde900^1#3cde8c0:1[Keyword:0] Line 9 Column 3 File C:/temp/test.c)'#'
   | (new_line@C~GCC4=1578#3cde8e0^1#3cde8c0:2[Keyword:0] Line 9 Column 9 File C:/temp/test.c)new_line
   |)endif_directive#3cde8c0
   )selection_statement#3cde940
  )compound_statement#3cde1e0
 )function_definition#3cde740
)translation_unit#3cde920

OP 询问如何进行转换的示例.如前所述,DMS 允许源到源转换模式,其形式为如果你看到这个,用那个替换"在被操纵的目标语言的表面语法中声明(在这种情况下,C 的 GCC4 版本).这种转换的价值在于它们比通过过程调用完成的传统 AST 黑客代码更容易编写.

OP asks for example of how to do his transformation. As stated earlier, DMS allows source-to-source transformation patterns, of the form of "if you see this, replace it by that" stated in the surface syntax of the target language being manipulated (in this case, GCC4 version of C). The value of such transformations is that they are much easier to write than the traditional AST hacking code done by procedure calls.

为了达到OP的效果,他需要如下DMS转换:

To achieve OP's effect, he needs the following DMS transformation:

    default domain C~GCC4; // tells DMS to use C domain with GCC4 dialect

    rule transform_pp_conditional_else(c1: condition, c2: condition,
                                       s1: statements, s2: statements, 
                                       pc1: preprocessor_condition):
         statement -> statement

      "if (c1) { s1 }
       #ifdef pc1
       else if (c2) { s2 }
       #endif"
   ->
       "{ bool test=c1;
          if (test) { s1 }
          #ifdef pc1
          if (!test && c2) { s2 }
          #endif
        }"

default domain 声明告诉 DMS 以下规则适用于 GCC4.这种转换在 DMS 中称为规则";它由子树的类型参数化.元引号..."用于区分 DMS 重写规则语法,与 C~GCC4 语法.我认为其余的已经足够清楚了.

The default domain declaration tells DMS that the following rules are for GCC4. The transformation is called a "rule" in DMS; it is parameterized by types of subtrees. The metaquotes "..." are to distinguish DMS rewrite rule syntax, from C~GCC4 syntax. I think the rest of it is clear enough.

这篇关于来自带有预处理器指令的 c 代码的 AST的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆