如何用 Perl 正则表达式替换重叠匹配? [英] How do I substitute overlapping matches with a Perl regex?

查看:33
本文介绍了如何用 Perl 正则表达式替换重叠匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在字符串中找到所有出现的 "BBB" 并将它们替换为 "D".例如,我有 "ABBBBC" 并且想要生成 "ADBC""ABDC".(先替换第一个BBB,再替换另一个BBB).在 Perl 中是否有一种很好的方法可以做到这一点?

I want to find all occurences of "BBB" in a string and substitute them with "D". For example, I have "ABBBBC" and want to produce "ADBC" and "ABDC". (First substitute the first BBB, and then substitute the other BBB). Is there a nice way to do this in Perl?

$str = "ABBBBC";
for ( $str =~ m/B(?=BB)/g ) {
    # I match both the BBBs here, but how to substitute the relevant part?
}

我想得到这个数组:('ADBC', 'ABDC'),它来自将 BBB 中的任何一个更改为 D.字符串 "ABBBBBC" 会给我 "ADBBC""ABCBC""ABBDC".>

I want to get this array: ('ADBC', 'ABDC'), which comes from changing either of the BBBs to a D. The string "ABBBBBC" would give me "ADBBC", "ABDBC" and "ABBDC".

推荐答案

要获得重叠匹配,您必须使用 Perl 的 pos 运算符.

To get overlapping matches, you have to play around with Perl's pos operator.

pos SCALAR
pos
返回有关变量的最后一次 m//g 搜索停止的位置的偏移量(当未指定变量时使用 $_).请注意,0 是有效的匹配偏移量.undef 表示搜索位置被重置(通常是由于匹配失败,但也可能是因为尚未在标量上运行匹配).

pos SCALAR
pos
Returns the offset of where the last m//g search left off for the variable in question ($_ is used when the variable is not specified). Note that 0 is a valid match offset. undef indicates that the search position is reset (usually due to match failure, but can also be because no match has yet been run on the scalar).

pos 直接访问正则表达式引擎用来存储偏移量的位置,因此分配给 pos 将更改该偏移量,因此也会影响 \G 零-width 正则表达式中的断言.这两种效果都会在下一场比赛中发生,因此您不能在当前比赛期间使用 pos 影响位置,例如在 (?{pos() = 5})s//pos() = 5/e.

pos directly accesses the location used by the regexp engine to store the offset, so assigning to pos will change that offset, and so will also influence the \G zero-width assertion in regular expressions. Both of these effects take place for the next match, so you can't affect the position with pos during the current match, such as in (?{pos() = 5}) or s//pos() = 5/e.

设置 pos 还会重置匹配的零长度标志,如 perlre 中匹配零长度子字符串的重复模式.

Setting pos also resets the matched with zero-length flag, described under Repeated Patterns Matching a Zero-length Substring in perlre.

因为失败的 m//gc 匹配不会重置偏移量,所以在这种情况下 pos 的返回也不会改变.参见 perlreperlop.

Because a failed m//gc match doesn't reset the offset, the return from pos won't change either in this case. See perlre and perlop.

例如:

#! /usr/bin/env perl

use strict;
use warnings;

my $str = "ABBBBC";
my @replaced;
while ($str =~ m/^(.*)\G(.+?)BBB(.*)$/g ) {
  push @replaced, $1 . $2 . "D" . $3;
  pos($str) = length($1) + 1;
}

print "[", join("][" => @replaced), "]\n";

输出:

$ ./prog
[ADBC][ABDC]

这篇关于如何用 Perl 正则表达式替换重叠匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆