如何在Flex中处理嵌套注释 [英] how to handle nested comment in flex

查看:66
本文介绍了如何在Flex中处理嵌套注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为支持像这样的嵌套注释的语言编写Flex扫描器:

I am working on writing a flex scanner for a language supporting nested comment like this:

/*
/**/
*/

我曾经在ocaml/ocamllex上工作,它们非常递归地支持递归调用lex扫描器.但是我现在要切换到c ++/flex,如何处理这样的嵌套注释?

I use to work on ocaml/ocamllex that support recursive calling lex scanner very elegent. But I am now switching to c++/flex, how to handle such nested comment?

推荐答案

假定只能将注释嵌套在注释中,对于使用简单计数器即可实现的解决方案,堆栈是非常昂贵的解决方案.例如:

Assuming that only comments can be nested in comments, a stack is a very expensive solution for what could be achieved with a simple counter. For example:

%x SC_COMMENT

%%
  int comment_nesting = 0;  /* Line 4 */

"/*"             { BEGIN(SC_COMMENT); }
<SC_COMMENT>{
  "/*"           { ++comment_nesting; }
  "*"+"/"        { if (comment_nesting) --comment_nesting;
                   else BEGIN(INITIAL); }
  "*"+           ; /* Line 11 */
  [^/*\n]+       ; /* Line 12 */
  [/]            ; /* Line 13 */
  \n             ; /* Line 14 */
}

一些解释:

第4行:在第一个规则之前的缩进行被插入到yylex函数的顶部,可用于声明和初始化局部变量.我们使用此函数在每次调用yylex时将注释嵌套深度初始化为0.必须保持不变的是,在INITIAL状态下comment_nesting始终为0.

Line 4: Indented lines before the first rule are inserted at the top of the yylex function where they can be used to declare and initialize local variables. We use this to initialize the comment nesting depth to 0 on every call to yylex. The invariant which must be maintained is that comment_nesting is always 0 in the INITIAL state.

第11-13行:单个模式.|\n是一个更简单的解决方案. ,但是这将导致每个注释字符都被视为一个单独的子令牌.即使相应的动作不执行任何操作,也可能导致扫描循环中断,并为每个字符执行动作切换语句.因此通常最好一次匹配多个字符.

Lines 11-13: A simpler solution would have been the single pattern .|\n. , but that would result in every comment character being treated as a separate subtoken. Even though the corresponding action does nothing, this would have caused the scan loop to be broken and the action switch statement to be executed for every character. So it is usually better to try to match several characters at once.

但是,我们需要注意/ * 字符;我们只能忽略那些确定不属于 * /的星号,这些星号终止(可能是嵌套的)注释.因此是第11行和第12行.(第12行将不匹配一个星号序列,后跟一个/,因为它们已经与上面的模式(在第9行)匹配.)并且我们需要如果/后面没有 * ,则忽略它.因此是第13行.

We need to be careful about / and * characters, though; we can only ignore those asterisks which we are certain are not part of the */ which terminates the (possibly nested) comment. Hence lines 11 and 12. (Line 12 won't match a sequence of asterisks which is followed by a / because those will already have been matched by the pattern above, at line 9.) And we need to ignore / if it is not followed by a *. Hence line 13.

第14行:但是,它也可能不是最佳选择,以匹配太大的令牌.

Line 14: However, it can also be sub-optimal to match too large a token.

首先,flex并未针对大型令牌进行优化,注释可能非常大.如果flex需要在令牌中间重新填充其缓冲区,它将把打开的令牌保留在新缓冲区中,然后从令牌的开头重新扫描.

First, flex is not optimized for large tokens, and comments can be very large. If flex needs to refill its buffer in the middle of a token, it will retain the open token in the new buffer, and then rescan from the beginning of the token.

第二,柔性扫描仪可以自动跟踪当前的行号,并且这样做相对有效.扫描程序仅在可能与换行符匹配的模式匹配的令牌中检查换行符.但是整个比赛都需要扫描.

Second, flex scanners can automatically track the current line number, and they do so relatively efficiently. The scanner checks for newlines only in tokens matched by patterns which could possibly match a newline. But the entire match needs to be scanned.

我们通过将注释中的换行符作为单独的令牌进行匹配,来减少这两个问题的影响. (第14行,另请参见第12行)这将yylineno扫描限制为单个字符,并且还限制了内部注释标记的预期长度.注释本身可能非常大,但是每行可能会被限制为合理的长度,从而避免了在缓冲区重新填充时可能发生二次重新扫描的情况.

We reduce the impact of both of these issues by matching newline characters inside comments as individual tokens. (Line 14, also see line 12) This limits the yylineno scan to a single character, and it also limits the expected length of internal comment tokens. The comment itself might be very large, but each line is likely to be limited to a reasonable length, thus avoiding the potentially quadratic rescan on buffer refill.

这篇关于如何在Flex中处理嵌套注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆