yytext包含不匹配的字符 [英] yytext contains characters not in match
问题描述
背景
我正在使用 flex 为正在实现的编程语言生成词法分析器.
此标识符规则存在一些问题:
[a-zA-Z_][a-zA-Z_0-9]* {
printf("yytext is %s\n", yytext);
yylval.s = yytext;
return TOK_IDENTIFIER;
}
当我的解析器解析这样的表达式时,该规则应能正常工作:
var0 = var1 + var2;
printf
语句将打印出以下内容:
yytext is 'var0'
yytext is 'var1'
yytext is 'var2'
这应该是什么.
问题
但是当我的解析器解析这样的函数声明时:
func(array[10] type, arg2 wef, arg3 afe);
现在printf
语句将显示以下内容:
yytext is 'array['
yytext is 'arg2 wef'
yytext is 'arg3 afe'
问题是yytext
包含不匹配的字符.
问题
为什么 flex 在yytext
中包含这些字符,我该如何解决这个问题?
我看不到您的词法分析器如何生成该输出,但是很容易看到在解析器中该如何生成.
基本上,保留yytext
的值是不正确的:
yylval.s = yytext; /* DON'T DO THIS */
实际上,这是一个悬空指针,因为yytext
指向lexer框架内的私有内存,并且该指针仅在下一次调用lexer之前有效.由于解析器通常需要在执行归约操作之前先查看下一个输入令牌,因此几乎可以肯定,执行该操作时,生产中每个终端的s
成员中的指针将已经失效. /p>
如果要保留yytext
指向的令牌的字符串值,则必须将其复制:
yylval.s = strdup(yytext);
,然后当您不再需要副本时,您将负责释放副本.
Background
I am using flex to generate a lexer for a programming language I am implementing.
I have some problems with this rule for identifiers:
[a-zA-Z_][a-zA-Z_0-9]* {
printf("yytext is %s\n", yytext);
yylval.s = yytext;
return TOK_IDENTIFIER;
}
The rule works as it should when my parser is parsing expressions like this:
var0 = var1 + var2;
The printf
statement will print out this:
yytext is 'var0'
yytext is 'var1'
yytext is 'var2'
Which is what it should.
The problem
But when my parser is parsing function declarations like this one:
func(array[10] type, arg2 wef, arg3 afe);
Now the printf
statement will print this:
yytext is 'array['
yytext is 'arg2 wef'
yytext is 'arg3 afe'
The problem is that yytext
contains characters that are not in the match.
Question
Why does flex include these characters in yytext
and how can I solve this problem?
I don't see how that output could be produced from your lexer, but it is easy to see how it could be produced in your parser.
Basically, it is not correct to retain the value of yytext
:
yylval.s = yytext; /* DON'T DO THIS */
In effect, that is a dangling pointer because yytext
is pointing to private memory inside the lexer framework, and the pointer is only valid until the next time the lexer is called. Since the parser generally needs to look at the next input token before executing a reduction action, it is almost certain that the pointer in the s
member of each terminal in the production will have been invalidated by the time the action is executed.
If you want to keep the string value of the token pointed to by yytext
, you must copy it:
yylval.s = strdup(yytext);
and then you will be responsible for freeing the copy when you no longer need it.
这篇关于yytext包含不匹配的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!