在(f)lex中复制整个输入行(以获得更好的错误消息)? [英] Copying entire input line in (f)lex (for better error messages)?

查看:92
本文介绍了在(f)lex中复制整个输入行(以获得更好的错误消息)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为使用yacc(或bison)和lex(或flex)的典型解析器的一部分,我想在lexer中复制整个输入行,以便以后出现错误时,程序可以打印出问题完整地排成一行,并在有问题的标记下放一个插入符号^.

As part of a typical parser using yacc (or bison) and lex (or flex), I'd like to copy entire input lines in the lexer so that, if there's an error later, the program can print out the offending line in its entirety and put a caret ^ under the offending token.

要复制该行,我目前正在做

To copy the line, I'm currently doing:

char *line;        // holds copy of entire line
bool copied_line;

%%

^.+  {
       if ( !copied_line ) {
          free( line );
          line = strdup( yytext );
          copied_line = true;
       }
       REJECT;
     }

/* ... other tokens ... */

\n   { copied_line = false; return END; }

这是可行的,但是从进入调试器开始,它的效率确实很低.似乎正在发生的事情是REJECT导致词法分析器一次退避一个字符,而不仅仅是跳到下一个可能的匹配.

This works, but, from stepping in a debugger, it's really inefficient. What seems to be going on is that the REJECT is causing the lexer to back off one character at a time rather than just jumping to the next possible match.

是否有更好,更有效的方法来获得我想要的东西?

Is there a better, more efficient way to get what I want?

推荐答案

以下是使用getline()YY_INPUT的可能定义.只要没有令牌同时包含换行符和后续字符,它就应该起作用. (令牌可以在末尾包含换行符.)具体来说,current_line将包含当前令牌的最后一行.

Here's a possible definition of YY_INPUT using getline(). It should work as long as no token includes both a newline character and the following character. (A token could include a newline character at the end.) Specifically, current_line will contain the last line of the current token.

成功完成词法扫描后,将释放current_line并将其余全局变量重置,以便可以对另一个输入进行词法分析.如果在到达输入结尾之前中断了词法扫描(例如,因为解析不成功),则应显式调用reset_current_line()以便执行这些任务.

On successful completion of the lexical scan, current_line will be freed and the remaining global variables reset so that another input can be lexically analysed. If the lexical scan is discontinued before end of input is reached (for example, because the parse was unsuccessful), an explicit call should be made to reset_current_line() in order to perform these tasks.

char* current_line = NULL;
size_t current_line_alloc = 0;
ssize_t current_line_sent = 0;
ssize_t current_line_len = 0;

void reset_current_line() {
  free(current_line);
  current_line = NULL;
  current_line_alloc = current_line_sent = current_line_len = 0;
}

ssize_t refill_flex_buffer(char* buf, size_t max_size) {
  ssize_t avail = current_line_len - current_line_sent;
  if (!avail) {
    current_line_sent = 0;
    avail = getline(&current_line, &current_line_alloc, stdin);
    if (avail < 0) {
      if (ferror(stdin)) { perror("Could not read input: "); }
      avail = 0;
    }
    current_line_len = avail;
  }
  if (avail > max_size) avail = max_size;
  memcpy(buf, current_line + current_line_sent, avail);
  current_line_sent += avail;
  if (!avail) reset_current_line();
  return avail;
}

#define YY_INPUT(buf, result, max_size) \
  result = refill_flex_buffer(buf, max_size);

尽管上面的代码不依赖于保持当前列的位置,但是如果您要标识当前令牌在当前行中的位置,这一点很重要.如果您不使用yylessyymore,以下内容将有所帮助:

Although the above code does not depend on maintaining the current column position, it is important if you want to identify where the current token is in the current line. The following will help provided you don't use yyless or yymore:

size_t current_col = 0, current_col_end = 0;
/* Call this in any token whose last character is \n,
 * but only after making use of column information.
 */
void reset_current_col() {
  current_col = current_col_end = 0;
}
#define YY_USER_ACTION \
  { current_col = current_col_end; current_col_end += yyleng; }

如果将此扫描器与具有超前解析器的解析器一起使用,仅保留输入流的一行可能是不够的,因为超前标记可能位于错误标记的后一行.在循环缓冲区中保留几条保留的行将是一个简单的增强,但是根本不知道需要多少行.

If you are using this scanner with a parser with lookahead, it may not be sufficient to keep only one line of the input stream, since the lookahead token may be on a subsequent line to the error token. Keeping several retained lines in a circular buffer would be a simple enhancement, but it is not at all obvious how many lines are necessary.

这篇关于在(f)lex中复制整个输入行(以获得更好的错误消息)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆