使用flex匹配带有转义字符的Python多行字符串 [英] Using flex for matching Python multiline strings with escaped characters

查看:126
本文介绍了使用flex匹配带有转义字符的Python多行字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何用flex匹配python多行注释.而且我遇到了一些麻烦,以下内容在 Regexr 上可以正常使用,但是flex无法识别,我不知道如何修复它.

I wonder how to match python multiple line comments with flex. And I meet some troubles, the following works fine on Regexr, but not recognized by flex, I don't know how to fix it.

"""[^"\\]*(?:(?:\\.|"{1,2}(?!"))[^"\\]*)*"""

以前,我使用过:

["]{3}(\\["])*(["]{0,2}[^"](\\["])*)*["]{3}

可以检测到类似的评论:

which can detect comments like :

"""A\"""A"""

但是,它不能处理多个\,例如

However, it cannot deal with multiple \, like

'''A\\\\'''A=B'''C'''

不能整体识别它:

'''A\\\\'''  (comment)   
A=B     
'''C'''(comment) 

推荐答案

您可以使用单个正则表达式识别Python长字符串.它虽然不漂亮,但我相信它能起作用:

You can recognize Python long strings with a single regex. It's not pretty, but I believe it works:

["]{3}(["]{0,2}([^\\"]|\\(.|\n)))*["]{3}

这与原始正则表达式非常相似,但是它没有尝试将其反斜杠处理限制为\",以便它可以正确地将\\标识为反斜杠字符.

This is fairly similar to your original regex, but it does not attempt to limit its backslash handling to \", so that it can correctly identify \\ as a backslashed character.

一种可能更容易阅读(但稍微慢一些)的解决方案是使用启动启动条件.在这里,我使用yymore()创建一个不包含"""分隔符的单个令牌,但是生产代码可能会试图解释Python的各种反斜杠转义符. (正是这种需求激发了开始条件的使用,而不是试图通过单个正则表达式来识别整个字符串.)

A possibly easier to read (but slightly slower) solution is to use start a start condition. Here I use yymore() to create a single token which does not include the """ delimiters, but production code would probably seek to interpret Python's various backslash escapes. (It is precisely this need which motivates the use of a start condition rather than trying to recognize the entire string with a single regex.)

%x SC_LONGSTRING
%%
["]{3}     BEGIN(SC_LONGSTRING);
<SC_LONGSTRING>{
  [^\\"]+  yymore();
  \\(.|\n) yymore();
  ["]["]?  yymore();
  ["]{3}   { BEGIN(INITIAL);
             yylval.str = malloc(yyleng - 2);
             memcpy(yylval.str, yytext, yyleng - 3);
             yylval.str[yyleng - 3] = 0;
             return TOKEN_STRING;
           }
}

这篇关于使用flex匹配带有转义字符的Python多行字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆