Flex:如何将术语定义为行首的第一个术语(排他) [英] Flex: How to define a term to be the first one at the beginning of a line(exclusively)

查看:85
本文介绍了Flex:如何将术语定义为行首的第一个术语(排他)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一些有关我在Flex代码中遇到的问题的帮助.

I need some help regarding a problem I face in my flex code.

我的任务:编写一个可识别编程语言声明部分的flex代码,如下所述.

My task: To write a flex code which recognizes the declaration part of a programming language, described below.

让编程语言PL.其变量定义部分描述如下:

Let a programming language PL. Its variable definition part is described as follows:

首先,我们必须从关键字"var"开始.编写完此关键字后,我们必须编写用逗号,"分隔的变量名称(一个或多个).然后插入一个冒号:",然后我们必须编写变量类型(在我的示例中为实数,布尔值,整数或char),后跟一个分号;".完成前面的步骤后,可以在新行中声明新变量(变量名以逗号,",冒号:",变量类型,分号;"分隔),但是我们一定不能使用在新行的开头再次输入"var"关键字("var"关键字被写入一次!!!)

At the beginning we have to start with the keyword "var". After writing this keyword we have to write the variable names(one or more) separated by commas ",". Then a colon ":" is inserted and after that we must write the variable type(say real, boolean, integer or char in my example) followed by a semicolon ";". After doing the previous steps there is the potentiality to declare into a new line new variables(variable names separated by commas "," followed by colon ":" followed by variable type followed by a semicolon ";"), but we must not use the "var" keyword again at the beginning of the new line( the "var" keyword is written once!!!)

例如

var number_of_attendants, sum: integer;
ticket_price: real;
symbols: char;

具体来说,我不知道如何定义每个声明部分都必须仅以'var'关键字开头.到现在为止,如果我将直接开始声明一个声明变量的声明部分,比如说x(在行首没有写"var"),那么就不会发生错误(不需要的状态).

Concretely, I do not know how to make it possible to define that each and every declaration part must start only with the 'var' keyword. Until now, if I would begin a declaration part directly declaring a variable, say x (without having written "var" at the beginning of the line), then no error would occur(unwanted state).

我当前在下面的flex代码:

My current flex code below:

%{
#include <stdio.h>
%}
VAR_DEFINER "var"
VAR_NAME [a-zA-Z][a-zA-Z0-9_]*
VAR_TYPE "real"|"boolean"|"integer"|"char"
SUBEXPRESSION [{VAR_NAME}[","{VAR_NAME}]*":"[ \t\n]*{VAR_TYPE}";"]+
EXPRESSION {VAR_DEFINER}{SUBEXPRESSION}
%%
^{EXPRESSION}                 { 
                                  printf("This is not a well-syntaxed expression!\n"); 
                                  return 0;
                            }
{EXPRESSION}                        printf("This is a well-syntaxed expression!\n");
";"[ \t\n]*{VAR_DEFINER}    {
                                  printf("The keyword 'var' is defined once at the beginning of a new line. You can not use it again\n");
                                  return 0;
                            }
{VAR_DEFINER}                  printf("A keyword: %s\n", yytext);
^{VAR_DEFINER}                 printf("Each and every declaration part must start with the 'var' keyword.\n");
{VAR_TYPE}";"                     printf("The variable type is: %s\n", yytext);
{VAR_NAME}                        printf("A variable name: %s\n", yytext);
","/[ \t\n]*{VAR_NAME}            /* eat up commas */
":"/[ \t\n]*{VAR_TYPE}";"         /* eat up single colon */
[ \t\n]+                          /* eat up whitespace */
.                           {
                                  printf("Unrecognized character: %s\n", yytext);
                                  return 0;
                            }
%%
main(argc, argv)
int argc;
char** argv;
{
++argv, --argc;
if (argc > 0) 
yyin = fopen(argv[0],"r");
else 
yyin = stdin;
yylex();
}

我希望尽可能清楚.

我期待着阅读您的答案!

I am looking forward to reading your answers!

推荐答案

您似乎正在尝试在扫描仪中执行过多操作.您真的必须在Flex中做所有事情吗?换句话说,这是学习Flex的高级用法的练习,还是使用更合适的工具可以解决的问题?

You seem to be trying to do too much in the scanner. Do you really have to do everything in Flex? In other words, is this an exercise to learn advanced use of Flex, or is it a problem that may be solved using more appropriate tools?

我读到,最早的Fortran编译器创建于1950年代,耗时18个工作人员年.如今,"《龙书》 说.效率提高的主要原因之一是我们已经学会了如何将编译器划分为可以单独构造的模块.典型的编译器的前两个这样的部分或 phases scanner parser .

I've read that the first Fortran compiler took 18 staff-years to create, back in the 1950's. Today, "a substantial compiler can be implemented even as a student project in a one-semester compiler design course", as the Dragon Book from 1986 says. One of the main reasons for this increased efficiency is that we have learned how to divide the compiler into modules that can be constructed separately. The two first such parts, or phases, of a typical compiler is the scanner and the parser.

扫描仪或词法分析器可以由Flex从规范文件中生成,或以其他方式构造.它的工作是读取由一系列字符组成的输入,并将其拆分为一系列 tokens .令牌是输入语言中最小的有意义部分,例如分号,关键字var,标识符number_of_attendants或运算符<=.您不应该使用扫描仪做更多的事情.

The scanner, or lexical analyzer, can be generated by Flex from a specification file, or constructed otherwise. Its job is to read the input, which consists of a sequence of characters, and split it into a sequence of tokens. A token is the smallest meaningful part of the input language, such as a semicolon, the keyword var, the identifier number_of_attendants, or the operator <=. You should not use the scanner to do more than that.

这就是我如何为您的令牌编写简化的Flex规范:

Here is how I woould write a simplified Flex specification for your tokens:

[ \t\n] { /* Ignore all whitespace */ }
var { return VAR; }
real { return REAL; }
boolean { return BOOLEAN; }
integer { return INTEGER; }
char { return CHAR; }
[a-zA-Z][a-zA-Z0-9_]* { return VAR_NAME; }
. { return yytext[0]; }

然后将令牌序列传递到解析器或语法分析器.解析器将令牌序列与语言的语法进行比较.例如,输入var number_of_attendants, sum : integer;由关键字var,逗号分隔的变量列表,冒号,数据类型关键字和分号组成.如果我了解您的输入应该是什么样的,那么此语法可能是正确的:

The sequence of tokens is then passed on to the parser, or syntactical analyzer. The parser compares the token sequence with the grammar for the language. For example, the input var number_of_attendants, sum : integer; consists of the keyword var, a comma-separated list of variables, a colon, a data type keyword, and a semicolon. If I understand what your input is supposed to look like, perhaps this grammar would be correct:

program : VAR typedecls ;
typedecls : typedecl | typedecls typedecl ;
typedecl : varlist ':' var_type ';' ;
varlist : VAR_NAME | varlist ',' VAR_NAME ;
var_type : REAL | BOOLEAN | INTEGER | CHAR ;

这种语法碰巧是以一种通常与Flex一起使用的解析器生成器Bison可以理解的格式编写的.

This grammar happens to be written in a format that Bison, a parser-generator that often is used together with Flex, can understand.

如果您使用Flex将解决方案分为词汇部分,而使用Bison将其分为语法部分,那么您的生活可能会变得更加简单和幸福.

If you separate your solution into a lexical part, using Flex, and a grammar part, using Bison, your life is likely to be much simpler and happier.

这篇关于Flex:如何将术语定义为行首的第一个术语(排他)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆