yyllocp-> first_line在reEntrant Bison解析器的第二次迭代中返回未初始化的值 [英] yyllocp->first_line returns uninitialized value in second iteration of a reEntrant Bison parser

查看:98
本文介绍了yyllocp-> first_line在reEntrant Bison解析器的第二次迭代中返回未初始化的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个reEntrant解析器,该解析器从字符串中获取输入,并具有维护上下文的结构.使用要解析的不同输入字符串调用一个函数.该功能的相关代码为:

I have a reEntrant parser which takes input from a string and has a structure to maintain context. A function is called with different input strings to be parsed. Relevant code of that function is:

void parseMyString(inputToBeParsed) {

 //LEXICAL COMPONENT - INITIATE LEX PROCESSING
   yyscan_t scanner;    
   YY_BUFFER_STATE  buffer;
   yylex_init_extra(&parseSupportStruct, &scanner );
   //yylex_init(&scanner);

   buffer = yy_scan_buffer(inputToBeParsed, i+2, scanner);

   if (buffer == NULL) {
       strcpy(errorStrings,"YY_BUFFER_STATE returned NULL pointer\n");
       return (-1);
   }


//BISON PART - THE ACTUAL PARSER
yyparse(scanner, &parseSupportStruct);

...

yylex_destroy(scanner);
...
}

我的.l选项是:

 %option noinput nounput noyywrap 8bit nodefault                                 
 %option yylineno
 %option reentrant bison-bridge bison-locations                                  
 %option extra-type="parseSupportStructType *"

.y中的相关行是:

  %define api.pure full
  %locations
  %param { yyscan_t scanner }
  %parse-param { parseSupportStructType* parseSupportStruct}
  %code {
    int yylex(YYSTYPE* yylvalp, YYLTYPE* yyllocp, yyscan_t scanner);
    void yyerror(YYLTYPE* yyllocp, yyscan_t unused, parseSupportStructType* parseSupportStruct,  const char* msg);
    char *yyget_text (yyscan_t);
    char *strcpy(char *, const char *);
  }
  %union {
     int numval;
     char *strval;
     double floatval; 
  }

在我的解析器中,按照某些规则,我尝试访问yyllocp-> first_line.在第一次调用parseMyString(...)时,我得到了正确的值.第二次,我得到一些未初始化的值.我是否需要在每次调用parseMyString时初始化yyllocp-> first_line?方式和地点?我知道我已经给出了部分经过编辑的代码来解释这种情况.很乐意提供更多详细信息.

In my parser, in some rules, I try to access yyllocp->first_line. In the first call to parseMyString(...), I get the correct value. The second time, I get some uninitialized value. Do I need to initialize yyllocp->first_line in each call to parseMyString? How and where? I know I have given partial, redacted code, to explain the situation. Will be happy to provide further details.

使用valgrind,我已尽我所能消除了内存泄漏,但是某些第三方库问题超出了我的控制范围.

Using valgrind I have removed memory leaks to the best of my abilites but some third-party library issues are beyond my control.

推荐答案

flex或bison中的任何内容都不会保持 yylloc 的值.

Nothing in flex or bison will maintain the value of yylloc.

Bison解析器(推送解析器除外)将初始化该变量.(如果您接受默认的位置类型-也就是说,您不 #define YYLTYPE ,- yylloc 将被初始化为 {1、1、1,1} .否则,它将被初始化为零,无论这意味着什么类型.)Bison还会生成代码,该代码根据非终端设备第一个终端的位置来计算非终端设备的位置.和最后一个孩子.Flex生成的代码完全不涉及位置对象.

Bison parsers (other than push parsers) will initialise that variable. (If you accept the default location type -- that is, you don't #define YYLTYPE -- yylloc will be initialised to {1, 1, 1, 1}. Otherwise, it will be zero-initialised, whatever that means for whatever type it is.) Bison also produces code which computes the location of a non-terminal based on the locations of the non-terminal's first and last children. Flex's generated code doesn't touch the location object at all.

如果您要求使用以下功能启用Flex功能,则Flex扫描仪会自动维护 yylineno

A flex scanner does automatically maintain yylineno if you ask enabled this feature with

%option yylineno

Flex通常可以比您更有效地执行此操作,并且可以处理所有极端情况( yyless yymore input() REJECT ).因此,如果您想跟踪行号,我强烈建议您让flex做.

Flex can usually do that more efficiently than you can, and it handles all the corner cases (yyless, yymore, input(), REJECT). So if you want to track line numbers, I strongly recommend letting flex do it.

但是flex的 yylineno 支持存在一个重要问题.在可重入的扫描器中,行号存储在每个伸缩缓冲区中,而不存储在扫描器状态对象中.恕我直言,几乎可以肯定这是存储它的正确位置,因为如果您使用多个缓冲区,则它们可能代表多个输入流,并且通常您会希望引用其文件中的行号.但是 yy_scan_buffer 不会初始化此字段.(因此, yy_scan_string yy_scan_bytes 都没有,它们只是围绕 yy_scan_buffer 的包装.)

But there is one important issue with flex's yylineno support. In a reentrant scanner, the line number is stored in each flex buffer, not in the scanner state object. That's almost certainly the correct place to store it, IMHO, because if you are using multiple buffers, they probably represent multiple input steams, and normally you'll want to cite the number of a line within its file. But yy_scan_buffer does not initialise this field. (And therefore neither do yy_scan_string and yy_scan_bytes, which are just wrappers around yy_scan_buffer.)

因此,如果您使用的是 yy_scan _ * 接口之一,则应在 yy_scan_ *之后立即调用 yyset_lineno 重置 yylineno .在您的情况下,这将是:

So if you are using one of the yy_scan_* interfaces, you should reset yylineno by calling yyset_lineno immediately after yy_scan_*. In your case, this would be:

buffer = yy_scan_buffer(inputToBeParsed, i+2, scanner);
yyset_lineno(1, scanner);

一旦有了 yylineno ,就很容易维护 yylloc 对象.Flex有一个钩子,使您可以在执行任何模式操作之前(即使该操作为空)注入代码,并且该钩子可用于自动维护 yylloc .在此答案中,我提供了此技术的简单示例(取决于是否要维护 yylineno 由Flex生成的扫描器):

Once you've got yylineno, it's easy to maintain the yylloc object. Flex has a hook which lets you inject code just before any the action for a pattern is executed (even if the action is empty) and this hook can be used to automatically maintain yylloc. In this answer, I provide a simple example of this technique (which depends on yylineno being maintained by the flex-generated scanner):

#define YY_USER_ACTION                                             \
  yylloc->first_line = yylloc->last_line;                          \
  yylloc->first_column = yylloc->last_column;                      \
  if (yylloc->last_line == yylineno)                               \
    yylloc->last_column += yyleng;                                 \
  else {                                                           \
    yylloc->last_line = yylineno;                                  \
    yylloc->last_column = yytext + yyleng - strrchr(yytext, '\n'); \
  }

正如该答案中的注释所示,以上内容并不完全是通用的,但在许多情况下都可以使用:

As the notes in that answer indicate, the above is not fully general, but it will work in many circumstances:

YY_USER_ACTION 宏应适用于不使用 yyless() yymore() input()的任何扫描仪 REJECT .正确应对这些功能并不是很困难,但在这里似乎超出了范围.

This YY_USER_ACTION macro should work for any scanner which does not use yyless(), yymore(), input() or REJECT. Correctly coping with these features is not too difficult but it seemed out of scope here.

在执行操作之前,您无法处理 yyless() yymore() REJECT (由于在执行操作之前无法知道是否它们将被执行),因此使用这些功能的应用程序中更强大的位置跟踪器将必须包含用于修复 yylloc():

You cannot handle yyless(), yymore() or REJECT before the action (since before the action it's not possible to know if they will be executed), so a more robust location-tracker in an application which used those features would have to include code to fix yylloc():

  • 对于 yyless(),设置 last_line last_column 的上述代码可以在之后重新执行yyless()调用,因为Flex扫描程序将修复 yyleng yylineno .

  • For yyless(), the above code for setting last_line and last_column can be re-executed after the yyless() call, since the flex scanner will fix yyleng and yylineno.

对于 REJECT ,无法在 REJECT 之后插入代码.处理该错误的唯一方法是保留 yylloc 的备份,并立即将其还原到 REJECT 宏之前.(我强烈建议您不要使用 REJECT .它效率极低,并且几乎总是可以用对 yyless()的调用和开始条件的组合来代替.)

For REJECT, it is not possible to insert code after REJECT. The only way to handle it is to keep a backup of yylloc and restore it immediately before the REJECT macro. (I strongly advise against using REJECT. It's extremely inefficient and can almost always be replaced with the combination of a call to yyless() and a start condition.)

对于 yymore() yylloc 仍然正确,但是 next 操作一定不能覆盖令牌的起始位置.要正确处理该问题,可能需要维护一个标志以指示是否已调用 yymore().

For yymore(), yylloc is still correct, but the next action must not overwrite the token start position. Getting that right would probably require maintaining a flag to indicate whether or not yymore() had been called.

对于 input(),如果希望将读取的字符视为当前标记的一部分,则可以在 yylloc 中将结束位置提前调用 input()(这需要区分 input()返回换行符,文件结束指示符或常规字符).另外,如果您不希望将使用 input()读取的字符视为任何标记的一部分,则需要放弃使用前一个标记的结束位置作为标记的开始位置的想法.当前令牌,这需要保留一个分隔位置值作为下一个令牌的起始位置.

For input(), if you want the characters read to be considered part of the current token, you could advance the end location in yylloc after the call to input() (which requires distinguishing between input() returning a newline, an end-of-file indicator, or a regular character). Alternatively, if you want the characters read with input() to not be considered part of any token, you would need to abandon the idea of using the end position of the previous token as the start position of the current token, which would require keeping a separation position value to be used as the start position of the next token.

这篇关于yyllocp-> first_line在reEntrant Bison解析器的第二次迭代中返回未初始化的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆