使用lex对转义序列进行正则表达式说明 [英] Regex clarification on escape sequences with lex
问题描述
我正在创建一个lexer.l文件,该文件按预期工作,但其中一部分除外.我有规则:
I'm creating a lexer.l file that is working as intended except for one part. I have the rule:
[\(\*.*\*\)] {}
我想这样做,所以当我在文件中遇到(* this is a test *)
时,我什么都不做.但是,当我运行lex lexer.l
时,我在规则\(
,\*
和\)
的行上收到警告,指出它们永远无法满足.所以我想我的问题是,为什么[\(\*.*\*\)] {}
会干扰\(
和其他?如何捕捉(* this is a test *)
?
which I want to make it so when I encounter (* this is a test *)
in a file, I simply do nothing with it. However when I run lex lexer.l
I get warning on lines with rules \(
, \*
, and \)
stating that they can never be met. So I guess my question is why would [\(\*.*\*\)] {}
interfere with \(
and the others? How can I catch (* this is a test *)
?
推荐答案
注释语法为(*…*)
的语言通常允许嵌套注释,并且(f)lex无法轻松识别嵌套注释,因为嵌套要求上下文无关语法,而词法扫描程序仅实现常规语言.
Languages with the comment syntax (*…*)
typically allow nested comments, and nested comments cannot easily be recognized by (f)lex because the nesting requires a context-free grammar, and the lexical scanner only implements regular languages.
如果注释不嵌套(因此(* something (* else *)
是注释,而不是较长注释的前缀),则可以使用正则表达式
If your comments do not nest (so that (* something (* else *)
is a comment, rather than the prefix of a longer comment), then you can use the regular expression
[(][*][^*]*[*]+([^*)][^*]*[*]+)*[)]
如果确实需要嵌套注释,则可以使用开始条件和堆栈(或如下所示的模拟堆栈):
If you do require nested comments, you can use start conditions and a stack (or a simulated stack, as below):
%x SC_COMMENT
%%
int comment_nesting = 0;
"(*" { BEGIN(SC_COMMENT); }
<SC_COMMENT>{
"(*" { ++comment_nesting; }
"*"+")" { if (comment_nesting) --comment_nesting;
else BEGIN(INITIAL); }
"*"+ ;
[^(*\n]+ ;
[(] ;
\n ;
}
该摘要摘录自此答案,并进行了少量调整,因为该答案可识别嵌套的/*…*/
注释.此处显示了更完整的代码说明.
That snippet was taken from this answer, with a small adjustment because that answer recognizes nested /*…*/
comments. A fuller explanation of the code appears there.
这篇关于使用lex对转义序列进行正则表达式说明的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!