使用Perl正则表达式捕获C样式代码块之前和之后的文本 [英] Capturing text before and after a C-style code block with a Perl regular expression
问题描述
我正在尝试使用Perl正则表达式在C样式代码块之前和之后捕获一些文本.到目前为止,这就是我所拥有的:
I am trying to capture some text before and after a C-style code block using a Perl regular expression. So far this is what I have:
use strict;
use warnings;
my $text = << "END";
int max(int x, int y)
{
if (x > y)
{
return x;
}
else
{
return y;
}
}
// more stuff to capture
END
# Regex to match a code block
my $code_block = qr/(?&block)
(?(DEFINE)
(?<block>
\{ # Match opening brace
(?: # Start non-capturing group
[^{}]++ # Match non-brace characters without backtracking
| # or
(?&block) # Recursively match the last captured group
)* # Match 0 or more times
\} # Match closing brace
)
)/x;
# $2 ends up undefined after the match
if ($text =~ m/(.+?)$code_block(.+)/s){
print $1;
print $2;
}
我遇到一个问题,即比赛后第二个捕获组未初始化. DEFINE
块之后是否没有办法继续正则表达式?我认为这应该很好.
I am having an issue with the 2nd capture group not being initialized after the match. Is there no way to continue a regular expression after a DEFINE
block? I would think that this should work fine.
$2
应该在代码块下方包含注释,但它不能,而且我也找不到很好的理由,为什么它不起作用.
$2
should contain the comment below the block of code but it doesn't and I can't find a good reason why this isn't working.
推荐答案
捕获组按在正则表达式中出现的顺序从左到右编号,而不是匹配的顺序.这是您的正则表达式的简化视图:
Capture groups are numbered left-to-right in the order they occur in the regex, not in the order they are matched. Here is a simplified view of your regex:
m/
(.+?) # group 1
(?: # the $code_block regex
(?&block)
(?(DEFINE)
(?<block> ... ) # group 2
)
)
(.+) # group 3
/xs
已命名的组也可以作为已编号的组进行访问.
Named groups can also be accessed as numbered groups.
第二个组是block
组.但是,该组仅用作命名子模式,而不用作捕获.因此,$2
捕获值是undef.
The 2nd group is the block
group. However, this group is only used as a named subpattern, not as a capture. As such, the $2
capture value is undef.
因此,代码块之后的文本将存储在捕获文件$3
中.
As a consequence, the text after the code-block will be stored in capture $3
.
有两种方法可以解决此问题:
There are two ways to deal with this problem:
-
对于复杂的正则表达式,仅使用命名捕获.从正则表达式对象组装正则表达式后,或者如果捕获是有条件的,就考虑它是复杂的.在这里:
For complex regexes, only use named capture. Consider a regex to be complex as soon as you assemble it from regex objects, or if captures are conditional. Here:
if ($text =~ m/(?<before>.+?)$code_block(?<afterwards>.+)/s){
print $+{before};
print $+{afterwards};
}
在末尾放置所有定义,它们不会弄乱您的捕获编号.例如,您的$code_block
正则表达式只会定义一个命名模式,然后您可以显式调用它.
Put all your defines at the end, where they can't mess up your capture numbering. For example, your $code_block
regex would only define a named pattern which you then invoke explicitly.
这篇关于使用Perl正则表达式捕获C样式代码块之前和之后的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!