AST从c代码与预处理器指令 [英] AST from c code with preprocessor directives

查看:154
本文介绍了AST从c代码与预处理器指令的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从gcc C代码构建一个AST(抽象语法树),以便进行一些转换(如下所示),然后再次将代码复制(生成)为C语法?

  if(condition_1){
//代码行1
}
#ifdef expression_1
else if(condition_2){
//代码行2
}
#endif

转换为

  bool test = condition_1; 
if(teste){
//代码行1
}
#ifdef expression_1
if(!(test)&&& condition_2){
//代码行2
}
#endif


解决方案

GCC本身将构建AST,但在扩展预处理器指令之前不会。所以预处理器条件消失了。完成转换后重新安装它们将非常困难。涉及条件本身的转换是不可能的。
所以GCC本身并不是一种很好的方式来获得你想要的AST。



如果你想解析你的代码示例(条件包裹在否则如果真的很好!),你需要一个 reengineering 解析器。这些解析器旨在支持重构。这些解析器需要捕获比传统解析器更多的信息,例如,令牌的列号,词汇项的格式等,以便能够从修改的树中重新生成源文本。对于C,这样的解析器也必须捕获proprocessor指令。

一个这样的重新设计解析器是我们的DMS Software Reengineering Toolkit和它的C前端,它处理C的许多方言,包括GCC 2/3/4 / 5。它被明确设计为捕获预处理器条件(包括您的具体示例)。 DMS还支持使用源到源转换来执行转换。



对于OP示例的更改制作合法版本,放置在 test.c

  void main(){
if(condition_1){
x ++;
}
#ifdef expression_1
else if(condition_2){
y ++;
}
#endif
}

... DMS C〜GCC4解析器(开箱即用)产生以下AST:

  C:\DMS\Domains\C \GCC4\Tools\Parser\Source>运行..\domainparser ++ AST C:\temp\test.c 
C〜GCC4域名解析器版本3.0.1(28449)
Copyright(C)1996-2015 Semantic Designs,Inc;版权所有; SD Confidential b $ b使用编码Unicode-UTF-8?ANSI + CRLF +1 / b使用编码Unicode-UTF-8?ANSI + CRLF +1 / ^ I

树中的28个树节点。
(translation_unit @ C〜GCC4 = 2#3cde920 ^ 0 Line 1 Column 1 File C:/temp/test.c
(function_definition @C_GCC4 = 966#3cde740 ^ 1#3cde920:1 Line 1列1文件C:/temp/test.c
(function_head @ C〜GCC4 = 967#3047320 ^ 1#3cde740:1第1行第1列文件C:/temp/test.c
simple_type_specifier @ C〜GCC4 = 686#3047180 ^ 1#3047320:1 Line 1 Column 1 File C:/temp/test.c)simple_type_specifier
(direct_declarator @ C〜GCC4 = 852#3047380 ^ 1#3047320:2第1行第6列文件C:/temp/test.c
|(IDENTIFIER @ C〜GCC4 = 1531#3047160 ^ 1#3047380:1 [`main']第1行第6列文件C:/ temp / test .c)IDENTIFIER
|(parameter_declaration_clause @ C〜GCC4 = 900#30473c0 ^ 1#3047380:2 Line 1 Column 12 File C:/temp/test.c)parameter_declaration_clause
)direct_declarator#3047380
)function_head#3047320
(compound_statement @C〜GCC4 = 507#3cde1e0 ^ 1#3cde740:2第1行第14列文件C:/temp/test.c
(selection_statement @C〜GCC4 = 539#3cde940 ^ 1#3cde1e0:1第2行第3列文件C:/temp/test.c
|(if_head @ C〜GCC 4 = 550#30476e0 ^ 1#3cde940:1第2行第3列文件C:/temp/test.c
| (IDENTIFIER @ C〜GCC4 = 1531#30473e0 ^ 1#30476e0:1 [`condition_1']第2行第7列文件C:/temp/test.c)IDENTIFIER
|)if_head#30476e0
| (compound_statement @ C〜GCC4 = 507#3cde700 ^ 1#3cde940:2 Line 2 Column 20 File C:/temp/test.c
|(expression_statement @C〜GCC4 = 503#3047740 ^ 1#3cde700:1第3行第6列文件C:/temp/test.c
|(后缀表达式@C〜GCC4 = 205#3047720 ^ 1#3047740:1行3列6文件C:/temp/test.c $ b $ (IDENTIFIER @ C〜GCC4 = 1531#3047700 ^ 1#3047720:1 [`x']第3行第6列文件C:/temp/test.c)标识符
|)后缀表达式#3047720
|)expression_statement#3047740
|)compound_statement#3cde700
|(if_directive @ C〜GCC4 = 1088#3cde7a0 ^ 1#3cde940:3第5行第3列文件C:/temp/test.c
|('#'@ C〜GCC4 = 1548#3cde820 ^ 1#3cde7a0:1 [Keyword:0] Line 5 Column 3 File C:/temp/test.c)'#'
|( IDENTIFIER @ C〜GCC4 = 1531#3cde1c0 ^ 1#3cde7a0:2 [`expression_1']第5行第10列文件C:/temp/test.c)IDENTIFIER
|(new_line @ C〜GCC4 = 15 78#3cde800 ^ 1#3cde7a0:3 [Keyword:0] Line 5 Column 22 File C:/temp/test.c)new_line
|)if_directive#3cde7a0
|(selection_statement @ C〜GCC4 = 527#3cde840 ^ 1#3cde940:4第6行第8列文件C:/temp/test.c
| (IDENTIFIER @ C〜GCC4 = 1531#3047340 ^ 1#3cde840:1 [`condition_2']第6行第12列文件C:/temp/test.c)IDENTIFIER
| (compound_statement_C_GCC4 = 507#3cde860 ^ 1#3cde840:2 Line 6 Column 25 File C:/temp/test.c
|(expression_statement_C_GCC4 = 503#3cde8a0 ^ 1#3cde860:1第7行第12列文件C:/temp/test.c
|(后缀表达式@C〜GCC4 = 205#3cde880 ^ 1#3cde8a0:1第7行第12列文件C:/temp/test.c $ b $ (IDENTIFIER @ C〜GCC4 = 1531#3cde780 ^ 1#3cde880:1 ['y']第7行第12列文件C:/temp/test.c)IDENTIFIER
|)后缀表达式#3cde880
|)expression_statement#3cde8a0
|)compound_statement#3cde860
|)selection_statement#3cde840
|(endif_directive @ C〜GCC4 = 1092#3cde8c0 ^ 1#3cde940:5第9行第3列文件C:/temp/test.c
|('#'@C〜GCC4 = 1548#3cde900 ^ 1#3cde8c0:1 [关键字:0]第9行第3列文件C:/temp/test.c )'#'
|(new_line @ C〜GCC4 = 1578#3cde8e0 ^ 1#3cde8c0:2 [关键字:0]第9行第9列文件C:/temp/test.c)new_line
| )endif_directive#3cde8c0
)selection_statement#3cde940
)compound_statement#3cde1e0
)function_definition#3编辑:OP要求例如如何做他的转型。如前所述,DMS允许源到源转换模式的形式为如果您看到这种情况,将其替换为在所操作的目标语言的表面语法中指出的格式(在本例中为GCC4版本的C) 。这种转换的价值在于它们比通过过程调用完成的传统AST黑客代码更容易编写。为了实现OP的效果,他需要以下DMS转换:

 默认域C〜GCC4; //告诉DMS通过GCC4方言使用C域

rule transform_pp_conditional_else(c1:condition,c2:condition,
s1:语句,s2:语句,
pc1:preprocessor_condition) :
声明 - >语句

if(\c1){\s1}
#ifdef \pc1
else if(\c2){\s2}
#endif
- >
{bool test = \c1;
if(test){\s1}
#ifdef \pc1
if(!test&& \c2 ){\s2}
#endif
}

> default domain 声明告诉DMS,以下规则适用于GCC4。这种转变在DMS中被称为规则它由子树的类型参数化。 metaquotes...将区分DMS重写规则语法和C〜GCC4语法。我认为其余部分已经足够清楚了。


How can I build an AST (Abstract Syntax Tree) from gcc C code in order to make some transformations, like below, and reproduce(generate) the code to C syntax again after that?

    if(condition_1){
     //lines of code 1
    }
    #ifdef expression_1
        else if(condition_2){
           //lines of code 2
        }
   #endif

into

bool test = condition_1;
if(teste){
 //lines of code 1
}
#ifdef expression_1
  if(!(test) && condition_2){
    //lines of code 2
  }
#endif

解决方案

GCC itself will build ASTs, but not before expanding the preprocessor directives. So the preprocessor conditionals are gone. Reinstalling them after you have done the transformations will be extremely hard. Doing transformations that involved the conditionals themselves will be impossible. So GCC itself is not a good way to get the ASTs you want.

If you want to parse your code example (the conditional wrapped around the else if is really nice!), you need a reengineering parser. These are parsers designed to support refactoring. Such parsers need to capture more than traditional parsers, e.g., column numbers of tokens, the format of lexical items, etc., to enable the regeneration of source text from the modified tree. For C, such a parser must capture the proprocessor directives, too. These are pretty rare.

One such reengineering parser is our DMS Software Reengineering Toolkit and its C front end, which handles many dialects of C including GCC 2/3/4/5. It is designed explicitly to capture preprocessor conditionals (including your specific example). DMS also has support for carrying out the transformations using source-to-source transformations.

For a changed-to-make-legal version of OP's example, placed in test.c:

void main () {
  if (condition_1) {
     x++; 
  }
  #ifdef expression_1
  else if (condition_2) {
         y++;
       }
  #endif
}

... the DMS C~GCC4 parser (out of the box) produces the following AST:

C:\DMS\Domains\C\GCC4\Tools\Parser\Source>run ..\domainparser ++AST C:\temp\test.c
C~GCC4 Domain Parser Version 3.0.1(28449)
Copyright (C) 1996-2015 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit
AST Optimizations: remove constant tokens, remove unary productions, compact sequences
Using encoding Unicode-UTF-8?ANSI +CRLF +1 /^I

28 tree nodes in tree.
(translation_unit@C~GCC4=2#3cde920^0 Line 1 Column 1 File C:/temp/test.c
 (function_definition@C~GCC4=966#3cde740^1#3cde920:1 Line 1 Column 1 File C:/temp/test.c
  (function_head@C~GCC4=967#3047320^1#3cde740:1 Line 1 Column 1 File C:/temp/test.c
   (simple_type_specifier@C~GCC4=686#3047180^1#3047320:1 Line 1 Column 1 File C:/temp/test.c)simple_type_specifier
   (direct_declarator@C~GCC4=852#3047380^1#3047320:2 Line 1 Column 6 File C:/temp/test.c
   |(IDENTIFIER@C~GCC4=1531#3047160^1#3047380:1[`main'] Line 1 Column 6 File C:/temp/test.c)IDENTIFIER
   |(parameter_declaration_clause@C~GCC4=900#30473c0^1#3047380:2 Line 1 Column 12 File C:/temp/test.c)parameter_declaration_clause
   )direct_declarator#3047380
  )function_head#3047320
  (compound_statement@C~GCC4=507#3cde1e0^1#3cde740:2 Line 1 Column 14 File C:/temp/test.c
   (selection_statement@C~GCC4=539#3cde940^1#3cde1e0:1 Line 2 Column 3 File C:/temp/test.c
   |(if_head@C~GCC4=550#30476e0^1#3cde940:1 Line 2 Column 3 File C:/temp/test.c
   | (IDENTIFIER@C~GCC4=1531#30473e0^1#30476e0:1[`condition_1'] Line 2 Column 7 File C:/temp/test.c)IDENTIFIER
   |)if_head#30476e0
   |(compound_statement@C~GCC4=507#3cde700^1#3cde940:2 Line 2 Column 20 File C:/temp/test.c
   | (expression_statement@C~GCC4=503#3047740^1#3cde700:1 Line 3 Column 6 File C:/temp/test.c
   |  (postfix_expression@C~GCC4=205#3047720^1#3047740:1 Line 3 Column 6 File C:/temp/test.c
   |   (IDENTIFIER@C~GCC4=1531#3047700^1#3047720:1[`x'] Line 3 Column 6 File C:/temp/test.c)IDENTIFIER
   |  )postfix_expression#3047720
   | )expression_statement#3047740
   |)compound_statement#3cde700
   |(if_directive@C~GCC4=1088#3cde7a0^1#3cde940:3 Line 5 Column 3 File C:/temp/test.c
   | ('#'@C~GCC4=1548#3cde820^1#3cde7a0:1[Keyword:0] Line 5 Column 3 File C:/temp/test.c)'#'
   | (IDENTIFIER@C~GCC4=1531#3cde1c0^1#3cde7a0:2[`expression_1'] Line 5 Column 10 File C:/temp/test.c)IDENTIFIER
   | (new_line@C~GCC4=1578#3cde800^1#3cde7a0:3[Keyword:0] Line 5 Column 22 File C:/temp/test.c)new_line
   |)if_directive#3cde7a0
   |(selection_statement@C~GCC4=527#3cde840^1#3cde940:4 Line 6 Column 8 File C:/temp/test.c
   | (IDENTIFIER@C~GCC4=1531#3047340^1#3cde840:1[`condition_2'] Line 6 Column 12 File C:/temp/test.c)IDENTIFIER
   | (compound_statement@C~GCC4=507#3cde860^1#3cde840:2 Line 6 Column 25 File C:/temp/test.c
   |  (expression_statement@C~GCC4=503#3cde8a0^1#3cde860:1 Line 7 Column 12 File C:/temp/test.c
   |   (postfix_expression@C~GCC4=205#3cde880^1#3cde8a0:1 Line 7 Column 12 File C:/temp/test.c
   |   |(IDENTIFIER@C~GCC4=1531#3cde780^1#3cde880:1[`y'] Line 7 Column 12 File C:/temp/test.c)IDENTIFIER
   |   )postfix_expression#3cde880
   |  )expression_statement#3cde8a0
   | )compound_statement#3cde860
   |)selection_statement#3cde840
   |(endif_directive@C~GCC4=1092#3cde8c0^1#3cde940:5 Line 9 Column 3 File C:/temp/test.c
   | ('#'@C~GCC4=1548#3cde900^1#3cde8c0:1[Keyword:0] Line 9 Column 3 File C:/temp/test.c)'#'
   | (new_line@C~GCC4=1578#3cde8e0^1#3cde8c0:2[Keyword:0] Line 9 Column 9 File C:/temp/test.c)new_line
   |)endif_directive#3cde8c0
   )selection_statement#3cde940
  )compound_statement#3cde1e0
 )function_definition#3cde740
)translation_unit#3cde920

EDIT: OP asks for example of how to do his transformation. As stated earlier, DMS allows source-to-source transformation patterns, of the form of "if you see this, replace it by that" stated in the surface syntax of the target language being manipulated (in this case, GCC4 version of C). The value of such transformations is that they are much easier to write than the traditional AST hacking code done by procedure calls.

To achieve OP's effect, he needs the following DMS transformation:

    default domain C~GCC4; // tells DMS to use C domain with GCC4 dialect

    rule transform_pp_conditional_else(c1: condition, c2: condition,
                                       s1: statements, s2: statements, 
                                       pc1: preprocessor_condition):
         statement -> statement

      "if (\c1) { \s1 }
       #ifdef \pc1
       else if (\c2) { \s2 }
       #endif"
   ->
       "{ bool test=\c1;
          if (test) { \s1 }
          #ifdef \pc1
          if (!test && \c2) { \s2 }
          #endif
        }"

The default domain declaration tells DMS that the following rules are for GCC4. The transformation is called a "rule" in DMS; it is parameterized by types of subtrees. The metaquotes "..." are to distinguish DMS rewrite rule syntax, from C~GCC4 syntax. I think the rest of it is clear enough.

这篇关于AST从c代码与预处理器指令的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆