Flex扫描仪的逐个字符描述 [英] Character-by-character description of flex scanner

查看:146
本文介绍了Flex扫描仪的逐个字符描述的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个相当大的flex/bison解析器(1000个语法规则,1500个状态,400个终端)中查找错误时,我很难过.扫描程序匹配在该特定点不应出现且不在数据文件中的终端.

I am having a really hard time tracking down a bug in a rather large flex/bison parser (1000 grammar rules, 1500 states, 400 terminals). The scanner matches a terminal that should not arise at this particular point and is not present in the data file.

我要解析的输入是

<el Re="1.0" Im="-1.0"/>

输出的最后几行是

Reading a token: Next token is token ELEMENTTEXT (1.1-1.1: )
matched 4 characters:  Re=
matched 1 characters: "
matched 6 characters: -1 Im=

这看起来像是内存损坏,因为源中不存在"-1 Im".我希望下一个标记为'1.0',与标记aNumber相匹配.

This looks like a memory corruption, since '-1 Im' is not present in the source. I expected the next token to be '1.0', which matches the token aNumber.

我已经检查了所有可以想到的内容,打开了野牛调试功能,这使我更加困惑,现在正尝试一次在扫描仪的内部播放一个字符.是否有任何工具可以按照以下方式为我提供输出:

I have checked everything I can think of, I turned on bison debugging --- which confused me more, and am now trying to play through the innards of the scanner one character at a time. Is there any tool that could provide me output along the lines of:

next character matched "x" - possible terminals
    ONE
    TWO
    SEVEN
...

推荐答案

我收集到所显示的调试输出是在解析器中生成的,而不是从扫描仪中生成的.在扫描仪中查看调试输出的最佳方法是使用-d--debug命令行选项生成扫描仪,或将%option debug放入Flex扫描仪定义中.这将为每条匹配的规则向stderr打印一行.

I gather that the debugging output being shown is generated in the parser, rather than from the scanner. The best way to see debugging output in the scanner is to generated the scanner using the -d or --debug command-line options, or put %option debug in your flex scanner definition. That will print a line to stderr for every matched rule.

基于DFA的正则表达式识别无法提供有意义的逐字符调试输出;从理论上讲,状态机的进度可以追溯,但是很难解释,而且可能没有那么有用.

DFA-based regex recognition does not provide meaningful character-by-character debugging output; in theory, the progress of the state machine could be traced but it would be very difficult to interpret and probably not all that useful.

解析器中调试输出中明显损坏的信息很可能是由于扫描程序的操作导致的,如下所示:

The apparently corrupted information in your debugging output in the parser is most likely the result of a scanner action like this:

{some_pattern}       { /* DO NOT DO THIS */ yylval.str = yytext; 
                       return SOME_TOKEN;
                     }

yytext的值及其指向的内存是扫描仪yylex专用的,并且这些值可以更改,恕不另行通知.特别是,一旦再次调用yylex以扫描超前令牌,缓冲区就很可能以不可预测的方式移动.

The value of yytext and the memory it points into are private to the scanner yylex, and the values can change without notice. In particular, once yylex is called again to scan the lookahead token, the buffer may well be moved around in unpredictable ways.

相反,您必须制作令牌字符串的副本(并记住在不再需要副本时将其释放):

Instead, you must make a copy of the token string (and remember to free the copy when you no longer need it):

{some_pattern}       { yylval.str = strdup(yytext); 
                       return SOME_TOKEN;
                     }

注意:如果您不想使用strdup(也许是因为您的令牌可能包含NUL字符),一个不错的选择是:

Note: If you don't want to use strdup (perhaps because your token might include NUL characters), a good alternative is:

char* buf = malloc(yyleng + 1); /* No need to call strlen */
memcpy(buf, yytext, yyleng);    /* Works even if there is a NUL in the token */
buf[yyleng] = 0;                /* Remember to NUL-terminate the copy */

参考文献:关于yytext的flex手动注释/关于被破坏的字符串的野牛常见问题解答

这篇关于Flex扫描仪的逐个字符描述的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆