使用 Perl 正则表达式捕获 C 样式代码块前后的文本 [英] Capturing text before and after a C-style code block with a Perl regular expression

查看：15 发布时间：2021/12/10 18:23:09 regex perl regex-recursion

本文介绍了使用 Perl 正则表达式捕获 C 样式代码块前后的文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 Perl 正则表达式在 C 样式代码块之前和之后捕获一些文本.到目前为止，这是我所拥有的:

I am trying to capture some text before and after a C-style code block using a Perl regular expression. So far this is what I have:

use strict;
use warnings;

my $text = << "END";
int max(int x, int y)
{
    if (x > y)
    {
        return x;
    }
    else
    {
        return y;
    }
}
// more stuff to capture
END

# Regex to match a code block
my $code_block = qr/(?&block)
(?(DEFINE)
    (?<block>
        {                # Match opening brace
            (?:           # Start non-capturing group
                [^{}]++   #     Match non-brace characters without backtracking
                |         #     or
                (?&block) #     Recursively match the last captured group
            )*            # Match 0 or more times
        }                # Match closing brace
    )
)/x;

# $2 ends up undefined after the match
if ($text =~ m/(.+?)$code_block(.+)/s){
    print $1;
    print $2;
}

我遇到了第二个捕获组在比赛后没有初始化的问题.有没有办法在 DEFINE 块之后继续一个正则表达式?我认为这应该可以正常工作.

I am having an issue with the 2nd capture group not being initialized after the match. Is there no way to continue a regular expression after a DEFINE block? I would think that this should work fine.

$2 应该包含代码块下方的注释，但它没有，我也找不到为什么这不起作用的充分理由.

$2 should contain the comment below the block of code but it doesn't and I can't find a good reason why this isn't working.

推荐答案

捕获组按照它们在正则表达式中出现的顺序从左到右编号，而不是按照它们匹配的顺序.这是您的正则表达式的简化视图:

Capture groups are numbered left-to-right in the order they occur in the regex, not in the order they are matched. Here is a simplified view of your regex:

m/
  (.+?)  # group 1
  (?:  # the $code_block regex
    (?&block)
    (?(DEFINE)
      (?<block> ... )  # group 2
    )
  )
  (.+)  # group 3
/xs

命名组也可以作为编号组访问.

Named groups can also be accessed as numbered groups.

第二组是 block 组.但是，该组仅用作命名子模式，而不用作捕获.因此，$2 捕获值为 undef.

The 2nd group is the block group. However, this group is only used as a named subpattern, not as a capture. As such, the $2 capture value is undef.

因此，代码块之后的文本将存储在捕获$3中.

As a consequence, the text after the code-block will be stored in capture $3.

有两种方法可以解决这个问题:

There are two ways to deal with this problem:

对于复杂的正则表达式，只使用命名捕获.一旦您从正则表达式对象组装正则表达式，或者如果捕获是有条件的，则认为正则表达式很复杂.这里:

For complex regexes, only use named capture. Consider a regex to be complex as soon as you assemble it from regex objects, or if captures are conditional. Here:

if ($text =~ m/(?<before>.+?)$code_block(?<afterwards>.+)/s){
    print $+{before};
    print $+{afterwards};
}

将所有定义放在最后，它们不会弄乱您的捕获编号.例如，您的 $code_block 正则表达式只会定义一个命名模式，然后您会显式调用该模式.

Put all your defines at the end, where they can't mess up your capture numbering. For example, your $code_block regex would only define a named pattern which you then invoke explicitly.

这篇关于使用 Perl 正则表达式捕获 C 样式代码块前后的文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 Perl 正则表达式捕获 C 样式代码块前后的文本 [英] Capturing text before and after a C-style code block with a Perl regular expression

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 Perl 正则表达式捕获 C 样式代码块前后的文本 [英] Capturing text before and after a C-style code block with a Perl regular expression

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭