Flex和终止状态机,用于读取字符串 [英] Flex and terminating state machine for reading strings
问题描述
我的flex文件在下面给出。除了琐碎的符号,它还定义了一个状态机来读取字符串。因此,它在遇到
时开始,并在找到随后的
时终止。现在,当我输入这个flex文件时,输入的是两个字符串,每个字符串之后是这样:
My flex file is given below. Beyond trivial symbols, it defines a state machine to read strings. So it starts whenever it encounters an "
and terminates on locating a following "
. Now when I feed this flex file an input with two strings followed by each other like this:
this apple
它可以正确识别出此内容,但找不到苹果。为什么会发生这种当前行为?我已经放入 BEGIN(INITIAL)
标识符,但是它不起作用。
It correctly identifies this but fails to find apple. Why is this current behavior happening? I have put in BEGIN(INITIAL)
identifier but it does not work.
/* sample simple scanner
*/
%{
int num_lines = 0;
#define CLASS 10
#define LAMBDA 1
#define DOT 2
#define PLUS 3
#define OPEN 4
#define CLOSE 5
#define NUM 6
#define ID 7
#define INVALID 8
#define MAX_STR_CONST 256;
#define COMMENT 11;
char string_buf[256];
char *string_buf_ptr;
char string_buf_cmnt[256];
char *string_buf_ptr_cmnt;
int size = 0;
%}
%x str
%x comment1
%x comment2
%%
\" {
string_buf_ptr = (char*)malloc(8); size = 0; BEGIN(str);}
<str>\" { /* saw closing quote - all done */
/* return string constant token type and
* value to parser
*/
*string_buf_ptr = '\0'; /* apppend the end of string with null */
string_buf_ptr = string_buf_ptr - size; /* scale back string ptr to start */
int i = 0;
for (; i < size; i++){
yytext[i]=*(string_buf_ptr + i); /* copy each character to yytext */
}
yytext[i]='\0'; /* apppend the end of string with null */
free(string_buf_ptr);
BEGIN(INITIAL); /* go back to start */
return ID;
}
<str>\n {
/* error - unterminated string constant */
/* generate error message */
//printf("error is here\n");
}
<str>\\0 ;
<str>\\[0-7]{1,3} {
/* octal escape sequence */
int result;
(void) sscanf( yytext + 1, "%o", &result );
if (result == 0x00){
*string_buf_ptr++ = '0';
} else {
if ( result > 0xff ){
/* error, constant is out-of-bounds */}
else{*string_buf_ptr++ = result;}
}
size++;
}
<str>\\[0-9]+ {
/* generate error - bad escape sequence; something
* like '\48' or '\0777777'
*/
}
<str>\\n *string_buf_ptr++ = '\n'; size++;
<str>\\t *string_buf_ptr++ = '\t'; size++;
<str>\\r *string_buf_ptr++ = '\r'; size++;
<str>\\b *string_buf_ptr++ = '\b'; size++;
<str>\\f *string_buf_ptr++ = '\f'; size++;
<str>\\a *string_buf_ptr++ = '\a'; size++;
<str>\\(.|\n) *string_buf_ptr++ = yytext[1]; size++;
<str>[^\\\n\"]+ {
//printf("there\n");
char *yptr = yytext;
int i = 0;
while ( *yptr )
{
*string_buf_ptr++ = *yptr++;
yytext[i] = *(string_buf_ptr-1);
size++;
i++;
}
}
[ ]+ //printf("space\n");
%%
main(int argc, char **argv) {
int res;
yyin = stdin;
while(res = yylex()) {
printf("class: %d lexeme: %s line: %d\n", res, yytext, num_lines);
}
}
推荐答案
您不能覆盖 yytext
不能保证 yytext
指向当前令牌之外的可用内存,并且无论如何您都不允许修改 yytext
在当前令牌之外。
You can't overwrite yytext
like that. yytext
is not guaranteed to point at usable memory beyond the current token, and anyway you're not allowed to modify yytext
outside of the current token.
因此,发生的事情是您最终复制了此
在待处理的输入上方,这会覆盖
开始第二个字符串。因此,它不会被识别为字符串。
So what's happening is that you end up copying this
over top of the pending input, which overwrites the "
which starts the second string. So it's not going to be recognized as a string.
代替覆盖 yytext
,只需将您的 string_buf_ptr
对于 yylex
的调用者来说是可见的,方法是将其设为全局变量或将指向返回值的指针作为附加参数传递给词法分析器(请参见 YY_DECL
宏)。当然,这将迫使您更改内存管理策略,但是由于某些令牌的长度可能超过七个字符,因此您当前的内存管理也无法正常工作。
Instead of overwriting yytext
, just make your string_buf_ptr
visible to the caller of yylex
by either making it a global variable or passing a pointer to a return value as an extra argument to the lexer (see the YY_DECL
macro). Of course, that will force you to change your memory management strategy, but your current memory management won't work either since some tokens are likely to be more than seven characters long.
就个人而言,我会避免使用全局变量,并保留一个静态字符*
,该字符可以通过 out $传递回调用方。 c $ c>参数。然后,如果他们需要将字符串保留在下一次对
yylex
的调用之外,则可以要求调用者复制该字符串。您可以坚持要求调用方 free
字符串,但是调用方副本策略的优点是,如果调用方不需要持久保存字符串,则不会进行复制。串。这正是 yytext
使用的策略; yytext
将在下一次调用 yylex
时被销毁,因此需要持久保存令牌值的调用方需要进行复制的 yytext
。
Personally, I'd avoid the global, and keep a static char*
which can be passed back to the caller via an out
parameter. Then you can require that the caller make a copy of the string if they need to keep it beyond the next call to yylex
. You could insist that the caller free
the string, but the advantage of the "caller copies" strategy is that no copy will be made if the caller doesn't need to persist the string. This is precisely the strategy used with yytext
; yytext
will be destroyed by the next call to yylex
so a caller needing to persist the token's value needs to make a copy of yytext
.
这篇关于Flex和终止状态机,用于读取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!