Flex和终止状态机,用于读取字符串 [英] Flex and terminating state machine for reading strings

查看:139
本文介绍了Flex和终止状态机,用于读取字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的flex文件在下面给出。除了琐碎的符号,它还定义了一个状态机来读取字符串。因此,它在遇到 时开始,并在找到随后的 时终止。现在,当我输入这个flex文件时,输入的是两个字符串,每个字符串之后是这样:

My flex file is given below. Beyond trivial symbols, it defines a state machine to read strings. So it starts whenever it encounters an " and terminates on locating a following ". Now when I feed this flex file an input with two strings followed by each other like this:

this apple

它可以正确识别出此内容,但找不到苹果。为什么会发生这种当前行为?我已经放入 BEGIN(INITIAL)标识符,但是它不起作用。

It correctly identifies this but fails to find apple. Why is this current behavior happening? I have put in BEGIN(INITIAL) identifier but it does not work.

/* sample simple scanner 
*/
%{    
int num_lines = 0;
#define CLASS 10
#define LAMBDA 1
#define DOT    2
#define PLUS   3
#define OPEN   4
#define CLOSE  5
#define NUM    6
#define ID     7
#define INVALID 8
#define MAX_STR_CONST 256;
#define COMMENT 11;

char string_buf[256];
char *string_buf_ptr;

char string_buf_cmnt[256];
char *string_buf_ptr_cmnt;
 int size = 0;
%}     
%x str
%x comment1
%x comment2
%%


\"     {
  string_buf_ptr = (char*)malloc(8); size = 0; BEGIN(str);}
<str>\"        {           /* saw closing quote - all done */
  /* return string constant token type and
   * value to parser
   */

  *string_buf_ptr = '\0';  /* apppend the end of string with null */

  string_buf_ptr = string_buf_ptr - size; /* scale back string ptr to start */

  int i = 0;

  for (; i < size; i++){
    yytext[i]=*(string_buf_ptr + i); /* copy each character to yytext */
  }

  yytext[i]='\0';             /* apppend the end of string with null */
  free(string_buf_ptr);

  BEGIN(INITIAL);            /* go back to start */
  return ID;
 }
<str>\n        {
  /* error - unterminated string constant */
  /* generate error message */
  //printf("error is here\n"); 
 }
<str>\\0        ;
<str>\\[0-7]{1,3} {
  /* octal escape sequence */
  int result;
  (void) sscanf( yytext + 1, "%o", &result );
  if (result == 0x00){
     *string_buf_ptr++ = '0';
  } else {
    if ( result > 0xff ){
      /* error, constant is out-of-bounds */}
    else{*string_buf_ptr++ = result;}
  }
       size++;
 }
<str>\\[0-9]+ {
  /* generate error - bad escape sequence; something
   * like '\48' or '\0777777'
   */
 }
<str>\\n  *string_buf_ptr++ = '\n';  size++; 
<str>\\t  *string_buf_ptr++ = '\t';  size++;
<str>\\r  *string_buf_ptr++ = '\r';  size++;
<str>\\b  *string_buf_ptr++ = '\b';  size++;
<str>\\f  *string_buf_ptr++ = '\f';  size++;
<str>\\a  *string_buf_ptr++ = '\a';  size++;

<str>\\(.|\n)  *string_buf_ptr++ = yytext[1];  size++;  

<str>[^\\\n\"]+        {
  //printf("there\n");
  char *yptr = yytext;
  int i = 0;
  while ( *yptr )
    {
      *string_buf_ptr++ = *yptr++;
      yytext[i] = *(string_buf_ptr-1);
      size++;
      i++;
    }
}
[ ]+     //printf("space\n");
%%


main(int argc, char **argv) {
  int res;
  yyin = stdin;

  while(res = yylex()) {  
    printf("class: %d lexeme: %s line: %d\n", res, yytext, num_lines); 
  }
} 


推荐答案

您不能覆盖 yytext 不能保证 yytext 指向当前令牌之外的可用内存,并且无论如何您都不允许修改 yytext 在当前令牌之外。

You can't overwrite yytext like that. yytext is not guaranteed to point at usable memory beyond the current token, and anyway you're not allowed to modify yytext outside of the current token.

因此,发生的事情是您最终复制了在待处理的输入上方,这会覆盖 开始第二个字符串。因此,它不会被识别为字符串。

So what's happening is that you end up copying this over top of the pending input, which overwrites the " which starts the second string. So it's not going to be recognized as a string.

代替覆盖 yytext ,只需将您的 string_buf_ptr 对于 yylex 的调用者来说是可见的,方法是将其设为全局变量或将指向返回值的指针作为附加参数传递给词法分析器(请参见 YY_DECL 宏)。当然,这将迫使您更改内存管理策略,但是由于某些令牌的长度可能超过七个字符,因此您当前的内存管理也无法正常工作。

Instead of overwriting yytext, just make your string_buf_ptr visible to the caller of yylex by either making it a global variable or passing a pointer to a return value as an extra argument to the lexer (see the YY_DECL macro). Of course, that will force you to change your memory management strategy, but your current memory management won't work either since some tokens are likely to be more than seven characters long.

就个人而言,我会避免使用全局变量,并保留一个静态字符* ,该字符可以通过 out 参数。然后,如果他们需要将字符串保留在下一次对 yylex 的调用之外,则可以要求调用者复制该字符串。您可以坚持要求调用方 free 字符串,但是调用方副本策略的优点是,如果调用方不需要持久保存字符串,则不会进行复制。串。这正是 yytext 使用的策略; yytext 将在下一次调用 yylex 时被销毁,因此需要持久保存令牌值的调用方需要进行复制的 yytext

Personally, I'd avoid the global, and keep a static char* which can be passed back to the caller via an out parameter. Then you can require that the caller make a copy of the string if they need to keep it beyond the next call to yylex. You could insist that the caller free the string, but the advantage of the "caller copies" strategy is that no copy will be made if the caller doesn't need to persist the string. This is precisely the strategy used with yytext; yytext will be destroyed by the next call to yylex so a caller needing to persist the token's value needs to make a copy of yytext.

这篇关于Flex和终止状态机,用于读取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆