Flex和野牛项目 [英] Project on flex and bison

查看:101
本文介绍了Flex和野牛项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用 flex bison 来制作词法分析器解析器,以用于 EBNF语法.这项工作完成了!我的意思是,当我将文件写入编写的程序时,可以看到程序是否有错误.如果没有,我可以根据自己使用的语法在屏幕上看到整个程序.我没问题.

I have used flex and bison in order to make a lexical analyzer and a parser for an EBNF grammar. This work is done! I mean, when i put a file with a program I write, I can see if the program has mistakes. If it doesn't, I can see the whole program in my screen based on the grammar i have used. I have no problem in this.

现在,我想使用循环处理和循环展开.我应该更改哪一部分?词法分析器?解析器?还是解析器之后的主力?以及如何?

Now, I want to use loop handling and loop unrolling. Which part should I change? The lexical analyzer? The parser? Or the main after the parser? And how?

推荐答案

简介

由于我们看不到一段代码,无法看到您如何在解析器中处理循环并输出代码,而您可能想展开的特定循环的示例很难提供更多信息比已经给出的更详细的建议.除了已经阅读您的问题的人以外,全球没有其他任何经验丰富的编译器作者或老师!因此,我们将需要探索其他方法来解释如何解决此类问题.

Introduction

As we don't have sight of a piece of your code to see how you are handling a loop in the parser and outputting code, and an example of a specific loop that you might want unrolled it is difficult to give any more detailed advice than that already given. There are unlikely to be any more experienced compiler writers or teachers anywhere on the globe than those already reading your question! So we will need to explore other ways to explain how to solve a problem like this.

人们常常无法发布代码示例,因为他们从作为课堂练习的一部分或从开放源代码存储库中提供的重要代码库入手,而他们并不完全了解其工作原理查找要发布的适当代码片段.让我们想象一下,您已经拥有了用于实际语言的可运行编译器的完整源代码,并且想要向该现有的可运行编译器中添加一些循环优化,然后您可能会说:什么源,如何显示某些源? " (因为实际上是成千上万的代码行).

It often happens that people can't post examples of their code because they started with a significant code base provided as part of a class exercise or from an open source repository, and they do not fully understand how it works to be able to find appropriate code fragments to post. Let's imagine that you had the complete source of a working compiler for a real language and wanted to add some loop optimisations to that existing, working compiler, you might then say, as you did, "what source, how can I show some source?" (because in actuality it is many tens of thousands of lines of code).

在没有参考代码的情况下,替代方法是创建一个示例,以解释问题和解决方案.这通常是在编译器教科书或编译器类中完成的.我将使用一个类似的简单示例来演示如何使用flex和bison工具实现这种优化.

In the absence of some code to reference the alternative is to create one, as an exemplar, to explain the problem and solution. This is often how it is done in compiler text books or compiler classes. I will use a similar simple example to demonstrate how such optimisations can be achieved using the tools flex and bison.

首先,我们需要定义示例的语言.为了将SO答案的大小限制在合理范围内,该语言必须非常简单.我将使用简单的表达式赋值作为我语言中的唯一语句形式.该语言中的变量将是单个字母,而常量将是正整数.唯一的表达式运算符是加号(+).用我的语言编写的示例程序可能是:

First, we need to define the language of the example. To keep within the reasonable size constraints of a SO answer the language must be very simple. I will use simple assignments of expressions as the only statement form in my language. The variables in this language will be single letters and the constants will be positive integers. The only expression operator is plus (+). An example program in my language might be:

   i = j + k; j = 1 + 2

由编译器生成的输出代码将是具有四个指令LDASTOADDSTP的单个累加器机器的简单汇编程序.为上述语句生成的代码为:

The output code generated by the compiler will be simple assembler for a single accumulator machine with four instructions, LDA, STO, ADD and STP. The code generated for the above statements would be:

LDA j
ADD k
STO i
LDA #1
ADD #2
STO j
STP

LDA将值或变量加载到累加器中时,ADD将变量或值添加到累加器中,STO将累加器存储回一个变量中. STP是程序结束的停止".

Where LDA loads a value or variable into the accumulator, ADD adds a variable or value to the accumulator, STO stores the accumulator back to a variable. STP is "stop" for the end-of-program.

上面显示的语言将需要ID和NUMBER的令牌,并且还应跳过空格.满足以下条件:

The language shown above will need the tokens for ID and NUMBER and should also skip whitespace. The following will suffice:

%{
#define yyterminate() return (END);
%}

digit [0-9]
id [a-z]
ws [\t\n\r ]

%%
{ws}+          /* Skip whitespace */
{digit}+      {yylval = (int)(0l - atol(yytext)); return(NUMBER); }
{id}          {yylval = yytext[0]; return(ID); }
"+"           {return('+'); }
"="           {return('='); }

Gory详细信息
只是一些有关其工作原理的注释.我已使用atol转换整数,以处理读取MAXINT时可能发生的潜在整数溢出.我要否定常量,以便可以轻松地将它们与一个字节中为正数的标识符区分开.我存储单个字符标识符,以避免负担说明符号表代码的负担,从而允许使用非常小的词法分析器,解析器和代码生成器.

Gory details
Just some notes on how this works. I've used atol to convert the integer to allow for deal with potential integer overflow that can occur in reading MAXINT. I'm negating the constants so they can be easily distinguished from the identifiers which will be positive in one byte. I'm storing single character identifiers to avoid having the burden of illustrating symbol table code and thus permit a very small lexer, parser and code generator.

野牛计划

要解析语言并从野牛动作中生成一些代码,我们可以通过以下野牛程序来实现:

The bison program

To parse the language and generate some code from the bison actions we can achieve this by the following bison program:

%{
#include <stdio.h>
%}

%token NUMBER ID END
%%
program : statements END  { printf("STP\n"); return(0) ; }
        ;

statements : statement
        | statements ';' statement 
         ;

statement : ID '=' expression { printf("STO %c\n",$1); }
          |
          ;

expression : operand {
             /* Load operand into accumulator */
             if ($1 <= 0) 
                  printf("LDA #%d\n",(int)0l-$1);
             else printf("LDA %c\n",$1);
            }
           | expression '+' operand  {
             /* Add operand to accumulator */
             if ($3 <= 0) 
                  printf("ADD #%d\n",(int)0l-$3);
             else printf("ADD %c\n",$3);
            }
           ;

operand : NUMBER  
        | ID      
        ;

%%   
#include "lex.yy.c"

方法论的解释
本段适用于那些知道如何执行此操作的人员,并且可能会查询我的示例中使用的方法.尽管这将是代码生成和优化的正统技术,但我还是刻意避免构建一棵树和走树.我想避免在示例中添加所有必要的代码开销来管理树并对其进行遍历.这样,我的示例编译器可能真的很小.但是,被限制为仅使用野牛动作来执行代码生成将我限制在野牛规则匹配的顺序上.这意味着只能真正生成伪机器代码.使用这种方法,一个源到源的例子将很难处理.我选择了一种理想的机器,它介于MU0和无寄存器的PDP/11之间,并且再次具有最少的功能来演示代码的一些优化.

Explanation of methodology
This paragraph is intended for those who know how to do this and might query the approach used in my examples. I've deliberately avoided building a tree and doing a tree walk, although this would be the orthodox technique for code generation and optimisation. I wanted to avoid adding all the necessary code overhead in the example to manage the tree and walk it. This way my example compiler can be really tiny. However, being restricted to only using bison action to perform the code generation limits me to the ordering of the bison rule matching. This meant that only pseudo-machine code could really be generated. A source-to-source example would be less tractable with this methodology. I've chosen an idealised machine that is a cross between MU0 and a register-less PDP/11, again with the bare minimum of features to demonstrate some optimisations of code.

优化

现在,在几行代码中,我们已经为该语言提供了一个有效的编译器,我们可以开始演示添加代码优化的过程是如何工作的. 正如尊敬的@Chris Dodd所说:

Optimisation

Now we have a working compiler for a language in a few lines of code we can start to demonstrate how the process of adding code optimisation might work. As has already been said by the esteemed @Chris Dodd:

如果要在解析后进行程序转换,则应在解析后进行.您可以递增地执行它们(在解析部分输入后从您的bison代码调用变换例程),也可以在解析完成之后进行,但是无论哪种方式,它们都在解析了要变换的程序部分之后发生.

If you want to do program transformations after parsing, you should do them after parsing. You can do them incrementally (calling transform routines from your bison code after parsing part of your input), or after parsing is complete, but either way, they happen after parsing the part of the program you are transforming.

此编译器通过解析部分输入后以增量方式发射代码来工作.识别出每个语句后,将调用野牛动作(在{...}子句中)以生成代码.如果要将其转换为更理想的代码,则必须更改此代码以生成所需的优化.为了实现有效的优化,我们需要清楚地了解要优化的语言功能以及最佳的转换形式.

This compiler works by emitting code incrementally after parsing part of the input. As each statement is recognised the bison action (within the {...} clause) is invoked to generate code. If this is to be transformed into more optimal code it is this code that has to be changed to generate the desired optimisation. To be able to achieve effective optimisation we need a clear understanding of what language features are to be optimised and what the optimal transformation should be.

可以在编译器中完成的常见优化(或代码转换)是恒定折叠.在常量折叠中,编译器用结果替换完全由数字组成的表达式.例如,考虑以下内容:

A common optimisation (or code transformation) that can be done in a compiler is constant folding. In constant folding the compiler replaces expressions made entirely of numbers by the result. For example consider the following:

i = 1 + 2

一种优化方法是将其视为:

An optimisation would be to treat this as:

i = 3

因此,1 + 2是由编译器添加的,因此不会放入生成的代码中以在运行时发生.我们期望得到以下输出:

Thus the addition of 1 + 2 was made by the compiler and not put into the generated code to occur at run time. We would expect the following output to result:

LDA #3
STO i

改进的代码生成器

我们可以通过查找在expression '+' operand的两面都有NUMBER的显式情况来实现改进的代码.为此,我们必须延迟对expression : operand采取的任何操作,以允许该值继续传播.由于可能未对expression的值进行求值,因此我们可能必须在赋值和加法时执行此操作,这会使if语句略有爆炸.但是,我们只需要更改规则statementexpression的操作,如下所示:

Improved Code Generator

We can implement the improved code by looking for the explicit case where we have a NUMBER on both sides of expression '+' operand. To do this we have to delay taking any action on expression : operand to permit the value to be propagated onwards. As the value for an expression might not have been evaluated we have to potentially do that on assignment and addition, which makes for a slight explosion of if statements. We only need to change the actions for the rules statement and expression however, which are as shown below:

statement : ID '=' expression { 
              /* Check for constant expression */
              if ($3 <= 0) printf("LDA #%d\n",(int)0l-$3);
              else 
                 /* Check if expression in accumulator */
                 if ($3 != 'A') printf("LDA %c\n",$3);
              /* Now store accumulator */
              printf("STO %c\n",$1);
              }
          |   /* empty statement */
          ;

expression : operand { $$ = $1 ; }
           | expression '+' operand  {
             /* First check for constant expression */
             if ( ($1 <= 0) && ($3 <= 0)) $$ = $1 + $3 ;
             else { /* No constant folding */
                    /* See if $1 already in accumulator */
                    if ($1 != 'A')
                        /* Load operand $1 into accumulator */
                       if ($1 <= 0) 
                          printf("LDA #%d\n",(int)0l-$1);
                       else printf("LDA %c\n",$1);
                    /* Add operand $3 to accumulator */
                    if ($3 <= 0) 
                       printf("ADD #%d\n",(int)0l-$3);
                    else printf("ADD %c\n",$3);
                    $$ = 'A'; /* Note accumulator result */
                 }
            }
           ;

如果生成结果编译器,您会发现它确实可以生成更好的代码并执行常量折叠转换.

If you build the resultant compiler, you will see that it does indeed generate better code and perform the constant folding transformation.

您在问题中特别询问的转换是循环展开的转换.在循环展开中,编译器将在循环开始和结束条件中查找一些特定的整数表达式值,以确定是否应执行展开的代码转换.然后,编译器将为循环生成两个可能的代码替代序列:展开和标准循环代码.我们可以使用整数增量在此示例微型编译器中演示此概念.

The transformation that you specifically asked about in your question was that of loop unrolling. In loop unrolling the compiler will look for some specific integer expression values in the loop start and end conditions to determine if the unrolled code transformation should be performed. The compiler can will then generate two possible code alternative sequences for loops, the unrolled and standard looping code. We can demonstrate this concept in this example mini-compiler by using integer increments.

如果我们假设机器码中有一条INC指令,该指令会将累加器加1,并且比执行ADD #1指令的速度更快,那么我们可以通过查找特定情况来进一步改进编译器.这涉及评估整数常量表达式并与特定值进行比较,以决定是否应使用替代代码序列-就像在循环展开中一样.例如:

If we imagine that the machine code has an INC instruction which increments the accumulator by one and is faster that performing an ADD #1 instruction, we can further improve the compiler by looking for that specific case. This involves evaluating integer constant expressions and comparing to a specific value to decide if an alternative code sequence should be used - just as in loop unrolling. For example:

i = j + 1

应导致:

LDA j
INC
STO i

最终代码生成器

要更改为n + 1生成的代码,我们只需要重新编码expression语义的一部分,然后测试是否在不折叠常量的情况下使用的常量将是1(在本示例中为否) ).结果代码变为:

Final Code Generator

To change the code generated for n + 1 we only need to recode part of the expression semantics and just test that when not folding constants wether the constant to be used would be 1 (which is negated in this example). The resultant code becomes:

expression : operand { $$ = $1 ; }
           | expression '+' operand  {
             /* First check for constant expression */
             if ( ($1 <= 0) && ($3 <= 0)) $$ = $1 + $3 ;
             else { /* No constant folding */
                    /* Check for special case of constant 1 on LHS */
                    if ($1 == -1) {
                        /* Swap LHS/RHS to permit INC usage */
                        $1 = $3;
                        $3 = -1;
                    }
                    /* See if $1 already in accumulator */
                    if ($1 != 'A')
                        /* Load operand $1 into accumulator */
                        if ($1 <= 0) 
                           printf("LDA #%d\n",(int)0l-$1);
                        else printf("LDA %c\n",$1);
                    /* Add operand $3 to accumulator */
                    if ($3 <= 0) 
                       /* test if ADD or INC */
                       if ($3 == -1) printf("INC\n");
                       else printf("ADD #%d\n",(int)0l-$3);
                    else printf("ADD %c\n",$3);
                    $$ = 'A'; /* Note accumulator result */
                 }
            }
           ;

摘要

在此迷你教程中,我们定义了完整的语言,完整的机器代码,编写了词法分析器,编译器,代码生成器和优化器.它简要演示了代码生成的过程,并指出了(尽管通常)如何执行代码转换和优化.它应该可以使其他(尚未见过)的编译器进行类似的改进,并解决了识别循环展开条件并针对该情况生成特定改进的问题.

Summary

In this mini-tutorial we have defined a whole language, a complete machine code, written a lexer, a compiler, a code generator and an optimiser. It has briefly demonstrated the process of code generation and indicated (albeit generally) how code transformation and optimisation could be performed. It should enable similar improvements to be made in other (as yet unseen) compilers, and has addressed the issue of identifying loop unrolling conditions and generating specific improvements for that case.

还应该明确指出,没有特定的某些程序代码示例来回答问题是多么困难

It should also have made it clear, how difficult it is to answer questions without specific examples of some program code to refer to.

这篇关于Flex和野牛项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆