带有Flex和Bison的C ++扫描器未定义对yylex和yytext的引用 [英] c++ scanner with flex and bison undefined reference to yylex and yytext

查看:237
本文介绍了带有Flex和Bison的C ++扫描器未定义对yylex和yytext的引用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个简单的解析器,以使用C ++的flex和bison分析BNF语法作为输入.我遇到一些编译时错误.我搜索了其他类似错误的问题,并更正了我的文件以匹配它们,但我仍然收到错误.

I'm trying to make a simple parser to analyze BNF grammars as the input using flex and bison with c++. I'm getting some compile time errors. I have searched in other questions with similar errors and have corrected my files to match theirs, I'm still getting the errors.

这是我的lex.l

%{
#include <iostream>
#include <string>
#define tkerror -1

#include "sintactic.tab.h"

using namespace std;
extern int row =1;
int col=0;
%}

%option caseful
%option noyywrap
%option yylineno
%option c++

ignora " "|\t|\n
ID [a-zA-Z]([a-zA-Z0-9_])*

%%

{ignora}+       {;}
"terminal"      {col = col + strlen(yylval.cad); strcpy(yylval.cad, yytext); return tkterminal;}
";"             {col = col + strlen(yylval.cad); strcpy(yylval.cad, yytext); return tkptocma;}
","             {col = col + strlen(yylval.cad); strcpy(yylval.cad, yytext); return tkcma;}
"no"            {col = col + strlen(yylval.cad); strcpy(yylval.cad, yytext); return tkno;}
"iniciar"       {col = col + strlen(yylval.cad); strcpy(yylval.cad, yytext); return tkiniciar;}
"con"           {col = col + strlen(yylval.cad); strcpy(yylval.cad, yytext); return tkcon;}
"="             {col = col + strlen(yylval.cad); strcpy(yylval.cad, yytext); return tkasignar;}
".rule"         {col = col + strlen(yylval.cad); strcpy(yylval.cad, yytext); return tkrule;}
"|"             {col = col + strlen(yylval.cad); strcpy(yylval.cad, yytext); return tkor;}
"%"             {col = col + strlen(yylval.cad); strcpy(yylval.cad, yytext); return tksep;}
"EPSILON"       {col = col + strlen(yylval.cad); strcpy(yylval.cad, yytext); return tkeps;}
{ID}            {col = col + strlen(yylval.cad); strcpy(yylval.cad, yytext); return tkid;}
[\r\n]          {row++; col = 0;}
.           {return tkerror;}

%%

这是我的意思.

%{
#include <iostream>
#include <string>
#include "lex.yy.cc"
using namespace std;

extern int row;
extern int yylineno;
extern int col;
extern char* yytext;

extern "C" int yylex();

int yyerror(const char* men)
{
   string output = yytext;
   std::cout<<"Error sintactico "<<output<<" linea "<<row<<" columna "<<col<<endl;

   return 0;
}

%}

%union{
    int entero;
    char cad [256];
}

%token<cad> TOK_EMPTY_LINE;

%token<cad> tkterminal;
%token<cad> tkptocma;
%token<cad> tkno;
%token<cad> tkiniciar;
%token<cad> tkcon;
%token<cad> tkasignar;
%token<cad> tkrule;
%token<cad> tkor;
%token<cad> tksep;
%token<cad> tkeps;
%token<cad> tkid;
%token<cad> tkcma;

%type<nodo> Lenguaje
%type<nodo> Area_Declaraciones;
%type<nodo> Area_NTInicial;
%type<nodo> Area_Gramatica;
%type<nodo> Lista_Declaraciones;
%type<nodo> Declaracion;
%type<nodo> Dec_Terminal;
%type<nodo> Dec_NoTerminal;
%type<nodo> Ids;
%type<nodo> Producciones;
%type<nodo> Produccion;
%type<nodo> Izquierda;
%type<nodo> Derecha;
%type<nodo> Id_Eps;

%%

Lenguaje: Area_Declaraciones tksep Area_NTInicial tksep Area_Gramatica;

Area_Declaraciones: Lista_Declaraciones;
Lista_Declaraciones: Declaracion Lista_Declaraciones
    | ;
Declaracion: Dec_Terminal
    | Dec_NoTerminal;
Dec_Terminal: tkterminal Ids tkptocma;
Dec_NoTerminal: tkno tkterminal Ids tkptocma;

Ids: tkid tkcma Ids
    | tkid;

Area_NTInicial: tkiniciar tkcon tkid tkptocma;

Area_Gramatica: Producciones;
Producciones: Produccion Producciones
    | ;
Produccion: Izquierda tkasignar Derecha tkptocma;
Izquierda: Ids tkrule;
Derecha: Id_Eps Derecha
    | Id_Eps
    | tkor Derecha;
Id_Eps: tkid
    | tkeps;

%%

使用控制台进行编译

bison -d sintactic.y
flex lex.l
g++ sintactic.tab.c -lfl -o scanner.sh

我得到的错误是这些:

/usr/lib/x86_64-linux-gnu/libfl_pic.a(libmain.o): On the function `main':(.text.startup+0x9): undefined reference to `yylex'

/tmp/ccatti3x.o: On the function `yyerror(char const*)': sintactic.tab.c:(.text+0x23): undefined reference to `yytext[abi:cxx11]'

/tmp/ccatti3x.o: On the function `yyparse()': sintactic.tab.c:(.text+0x409): undefined reference to `yylex()'

collect2: error: ld returned 1 exit status

我可以在没有错误的情况下编译动作,然后将其添加到sintactic.y中.我看过其他示例,这些示例几乎与我的示例具有相同的功能,并且它们看起来可以很好地编译.我不习惯使用c ++或flex/bison,所以我真的不知道这些错误可能来自哪里.

I will add the actions to sintactic.y after I can compile these without errors. I've seen other examples with almost the same stuff as mine and they seem to compile fine. I'm not used to c++ or flex/bison so I don't really know where those errors may come from.

推荐答案

  1. 删除%option c++.它根本没有帮助您.您可以使用C ++代码,包括C ++数据类型,但union成员的通常限制除外(这意味着您不能将带有非平凡析构函数的任何内容用作语义类型并集的一部分.)使用此选项会使flex生成完全不同的API,不包含全局函数yylex.您可以根据需要使用此API(它已在flex手册中进行了说明),但您不会找到很多示例.使用标准的C接口(实际上就是您要使用的接口)要容易得多.

  1. Remove %option c++. It's not helping you at all. You can use C++ code, including C++ datatypes, except for the usual limitations on union members (which means that you cannot use anything with a non-trivial destructor as part of the semantic type union.) Using this option causes flex to generate a completely different API, which does not include the global function yylex. You can use this API if you want to -- it's documented in the flex manual -- but you won't find a lot of examples. Much easier is to use the standard C interface, which is, in fact, what you're trying to use.

更改野牛文件中yylex的声明,

Change the declaration of yylex in the bison file from

extern "C" int yylex();

int yylex();

将其声明为C会更改其名称在内部表示的方式;如果在某个C ++文件中将函数声明为extern "C",则必须在所有函数中都声明为extern "C",包括在其中定义了该函数的(在这种情况下,是词法扫描程序.)

Declaring it as C changes the way its name is represented internally; if you declare a function as extern "C" in some C++ file, you must do so in all of them, including the one in which it is defined (in thus case, the lexical scanner.)

您可以通过定义YY_DECL宏将声明添加到flex文件中,但是在这种情况下没有意义.使其更容易编译为C ++函数.

You could add the declaration to the flex file by defining the YY_DECL macro but there is no point in this case. It is easier to let it be compiled as a C++ function.

extern int row = 1;中的extern并不是必需的.如果在文件级声明int row = 1;,它将具有全局链接(即,您可以从其他翻译单元"(源文件)引用它).通常应使用extern来表示已定义标识符在不同的翻译单元中;在这种情况下,您无需初始化标识符(因为它是在定义它的源文件中初始化的.)

The extern in extern int row = 1; is not really necessary. If you declare int row = 1; at file-level, it will have global linkage (i.e., you can reference it from a different "translation unit" (source file).) You would normally use extern to indicate that an identifier is defined in a different translation unit; in that case, you don't initialize the identifier (since it is initialized in the source file in which it is defined.)

不要在野牛文件中#include "lex.yy.cc".我见过为避免在C(或C ++)项目中向学生介绍多个翻译单元的情况而建议这样做的教授,但这是死胡同.了解如何进行单独的编译;它会在任何好的C教科书中进行描述. (或者只是在编译行中放入由bison和flex生成的C文件的名称.)如果按照我的建议在第1点中,则由flex生成的扫描仪的默认名称将为lex.yy.c,但不要依赖于此:使用flex -o选项为生成的文件赋予一个有意义的名称(如果您想将其编译为C ++,请使用.cc扩展名.)

Do not #include "lex.yy.cc" in your bison file. I've seen professors who recommend that practice in order to avoid teaching their students about multiple translation units in a C (or C++) project, but that's a dead-end. Learn how to do separate compilation; it will be described in any good C textbook. (Or just put the names of both the bison- and flex-generated C files in the compilation line.) If you take my advice in point 1, the default name of the flex-generated scanner will be lex.yy.c but don't rely on that: use the flex -o option to give a meaningful names to the generated file (with a .cc extension if you want to compile it as C++).

在野牛动作中使用yytext通常不是一个好主意,但是如果要使用它(例如,在错误报告中),请确保将其声明为char*.

Using yytext in your bison actions is not usually a good idea, but if you are going to use it (for example, in an error report), make sure you declare it as char*.

请勿在弹性动作中使用strlen(yytext). Flex有助于将令牌的长度放入yyleng中,因此您无需重新扫描即可确定令牌的长度.

Don't use strlen(yytext) in your flex actions. Flex helpfully puts the length of the token in yyleng so you don't have to rescan to figure out how long it is.

您的语义类型(char cad[250])中的大型固定长度缓冲区确实不是一个好主意. (小的固定长度缓冲区会更糟.)首先,它们会破坏解析器堆栈的大小,因为每个堆栈插槽都必须为缓冲区留出空间.另外,每次复制堆栈插槽时,整个缓冲区都会被复制,这是浪费循环的.最重要的是,定长缓冲区只是在问麻烦:请参见缓冲区溢出.动态分配内存以保存令牌的副本很容易. (在大多数现代系统上,strdup(yytext)就足够了.尽管标准C库不需要strdup,但Posix却需要它.)棘手的部分是知道何时free()分配的内存,但是应该很明显当您编写动作时.

Large fixed-length buffers in your semantic type (char cad[250]) are really not a good idea. (And small fixed-length buffers are even worse.) First, they blow up the size of the parser stack, since every stack slot has to have room for the buffer. Also, the entire buffer will be copied every time you copy a stack slot, which is a waste of cycles. Most importantly, fixed-length buffers are just asking for trouble: see buffer overrun. Dynamically allocating memory to hold a copy of the token is easy. (strdup(yytext) is sufficient on most modern systems. Although strdup is not required by the standard C library, it is required by Posix.) The tricky part is knowing when to free() the allocated memory, but it should be obvious when you're writing your actions.

通过不保存诸如"iniciar"之类的标记名称,使您省去了很多关于动态分配和释放字符串的思考.您知道tkiniciar对应于字符串"iniciar",但是您的解析器将永远不会查询令牌的语义值.

Save yourself a lot of thinking about dynamic allocation and freeing of strings by not saving the names of tokens such as "iniciar". You know that tkiniciar corresponds to the string "iniciar", but your parser will never consult the semantic value of the token. You should only need semantic values for identifier tokens and literal constants (and in the case of numeric constants, you could use strtod or similar to produce an integer instead of passing the string to bison.)

在flex中,如果要匹配单个空格字符,则可以编写[[:space:]]. (除了空格,换行符和制表符之外,它还将与\f\v匹配,但这应该不成问题.)您还可以使用[[:alpha:]][[:digit:]][[:alnum:]]. (这些是实际的字符类,因此您可以在该类中添加更多内容.例如,[[:alnum:]_]是字母,数字或下划线.)您无需在字符类上加上括号即可重复它们;例如,[[:alnum:]_]是字母,数字或下划线.标识符模式可以是[[:alpha:]_][[:alnum:]_]*.有关更多详细信息,请参见关于模式的flex手册部分.

In flex, you can write [[:space:]] if you want to match a single whitespace character. (In addition to space, newline and tab, it will also match \f and \v, but that shouldn't be a problem.) You can also use [[:alpha:]], [[:digit:]] and [[:alnum:]]. (These are actual character classes, so you can add more things to the class. For example, [[:alnum:]_] is a letter, a digit, or an underscore.) You don't need to parenthesize character classes in order to repeat them; an identifier pattern could be [[:alpha:]_][[:alnum:]_]*. See the flex manual section on patterns for more details.

删除#define tkerror -1.从yylex返回负整数到野牛解析器是未定义的行为,并且不会做您想要的事情.而是将tkerror声明为%token(没有语义类型),以便yylex将能够返回它. (结果将是语法错误,因为没有产品使用tkerror令牌.)

Remove #define tkerror -1. Returning a negative integer from yylex to a bison parser is Undefined Behaviour, and will not do what you want. Instead, declare tkerror to be a %token (with no semantic type), so that yylex will be able to return it. (The result will be a syntax error because no production uses the tkerror token.)

标记和非终端的命名方式完全取决于您,但是通常的样式是对令牌使用ALL_CAPS,对非终端使用lower_case. (有些人像您一样喜欢大写,但是我们大多数人使用小写字母.)由于令牌(但非非终结符)最终成为生成的代码中的标识符,因此它们必须符合C命名规则(或C ++规则) ,并且它们不能与代码生成的其他标识符名称发生冲突(例如,包括flex文件中的启动条件和BEGIN之类的flex宏).因此有时用TOK_之类的标记名称作为前缀是很有用的.特别是TOK_BEGINTOK_END很常见. TOK_EMPTY_LINE对我来说似乎是不必要的,并且在任何情况下,您的flex文件都不会生成具有该名称的令牌,因此您最好也将其消除.

How you name your tokens and non-terminals is, of course, completely up to you, but normal style is to use ALL_CAPS for tokens and lower_case for non-terminals. (Some people like to capitalize, as you do, but most of us use lower-case.) Since tokens (but not non-terminals) end up being identifiers in the generated code, they must conform to C naming rules (or C++ rules, as appropriate), and they cannot collide with other identifier names generated by the code (which includes, for example, start conditions in the flex file, and flex macros like BEGIN). So it is sometimes useful to prefix token names with something like TOK_; in particular, TOK_BEGIN and TOK_END are pretty common. TOK_EMPTY_LINE seems unnecessary to me, and in any case your flex file never generates a token with that name, so you might as well just eliminate it as well.

这篇关于带有Flex和Bison的C ++扫描器未定义对yylex和yytext的引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆