yytext包含不匹配的字符 [英] yytext contains characters not in match

查看：103 发布时间：2020/4/30 10:09:08 regex string match flex-lexer lex

本文介绍了yytext包含不匹配的字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

背景

我正在使用 flex 为正在实现的编程语言生成词法分析器.

此标识符规则存在一些问题:

[a-zA-Z_][a-zA-Z_0-9]* {
    printf("yytext is %s\n", yytext);    
    yylval.s = yytext;
    return TOK_IDENTIFIER;
}

当我的解析器解析这样的表达式时，该规则应能正常工作:

var0 = var1 + var2;

printf语句将打印出以下内容:

yytext is 'var0'
yytext is 'var1'
yytext is 'var2'

这应该是什么.

问题

但是当我的解析器解析这样的函数声明时:

func(array[10] type, arg2 wef, arg3 afe);

现在printf语句将显示以下内容:

yytext is 'array['
yytext is 'arg2 wef'
yytext is 'arg3 afe'

问题是yytext包含不匹配的字符.

问题

为什么 flex 在yytext中包含这些字符，我该如何解决这个问题?

解决方案

我看不到您的词法分析器如何生成该输出，但是很容易看到在解析器中该如何生成.

基本上，保留yytext的值是不正确的:

yylval.s = yytext;  /* DON'T DO THIS */

实际上，这是一个悬空指针，因为yytext指向lexer框架内的私有内存，并且该指针仅在下一次调用lexer之前有效.由于解析器通常需要在执行归约操作之前先查看下一个输入令牌，因此几乎可以肯定，执行该操作时，生产中每个终端的s成员中的指针将已经失效. /p>

如果要保留yytext指向的令牌的字符串值，则必须将其复制:

yylval.s = strdup(yytext);

，然后当您不再需要副本时，您将负责释放副本.

Background

I am using flex to generate a lexer for a programming language I am implementing.

I have some problems with this rule for identifiers:

[a-zA-Z_][a-zA-Z_0-9]* {
    printf("yytext is %s\n", yytext);    
    yylval.s = yytext;
    return TOK_IDENTIFIER;
}

The rule works as it should when my parser is parsing expressions like this:

var0 = var1 + var2;

The printf statement will print out this:

yytext is 'var0'
yytext is 'var1'
yytext is 'var2'

Which is what it should.

The problem

But when my parser is parsing function declarations like this one:

func(array[10] type, arg2 wef, arg3 afe);

Now the printf statement will print this:

yytext is 'array['
yytext is 'arg2 wef'
yytext is 'arg3 afe'

The problem is that yytext contains characters that are not in the match.

Question

Why does flex include these characters in yytext and how can I solve this problem?

解决方案

I don't see how that output could be produced from your lexer, but it is easy to see how it could be produced in your parser.

Basically, it is not correct to retain the value of yytext:

yylval.s = yytext;  /* DON'T DO THIS */

In effect, that is a dangling pointer because yytext is pointing to private memory inside the lexer framework, and the pointer is only valid until the next time the lexer is called. Since the parser generally needs to look at the next input token before executing a reduction action, it is almost certain that the pointer in the s member of each terminal in the production will have been invalidated by the time the action is executed.

If you want to keep the string value of the token pointed to by yytext, you must copy it:

yylval.s = strdup(yytext);

and then you will be responsible for freeing the copy when you no longer need it.

这篇关于yytext包含不匹配的字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

yytext包含不匹配的字符 [英] yytext contains characters not in match

问题描述

背景

问题

问题

Background

The problem

Question

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

yytext包含不匹配的字符 [英] yytext contains characters not in match

问题描述

背景

问题

问题

Background

The problem

Question

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭