正则表达式替换中是否有类似计数器变量的东西? [英] Is there something like a counter variable in regular expression replace?

查看:32
本文介绍了正则表达式替换中是否有类似计数器变量的东西?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有很多匹配项,例如在多行模式下,我想用匹配项的一部分以及递增的计数器编号替换它们.

If I have a lot of matches, for example in multi line mode, and I want to replace them with part of the match as well as a counter number that increments.

我想知道是否有任何正则表达式风格有这样的变量.我找不到,但我似乎记得有类似的东西存在......

I was wondering if any regex flavor has such a variable. I couldn't find one, but I seem to remember something like that exists...

我不是在谈论可以使用回调进行替换的脚本语言.这是关于能够在 RegexBuddy、sublime text、gskinner.com/RegExr 等工具中执行此操作,...与您可以使用 1 或 $1 引用捕获的子字符串的方式非常相似.

I'm not talking about scripting languages in which you can use callbacks for replacement. It's about being able to do this in tools like RegexBuddy, sublime text, gskinner.com/RegExr, ... much in the same way you can refer to captured substrings with 1 or $1.

推荐答案

关于 Fancy Regexes 的 FMTEYEWTK

好的,我将从简单走向崇高.享受!

FMTEYEWTK about Fancy Regexes

Ok, I’m going to go from the simple to the sublime. Enjoy!

鉴于此:

#!/usr/bin/perl

$_ = <<"End_of_G&S";
    This particularly rapid,
        unintelligible patter
    isn't generally heard,
        and if it is it doesn't matter!
End_of_G&S

my $count = 0;

那么:

s{
     ( [w']+ ) 
}{
    sprintf "(%s)[%d]", $1, ++$count;
}gsex;

产生这个

(This)[1] (particularly)[2] (rapid)[3],
    (unintelligible)[4] (patter)[5]
(isn't)[6] (generally)[7] (heard)[8], 
    (and)[9] (if)[10] (it)[11] (is)[12] (it)[13] (doesn't)[14] (matter)[15]!

匿名数组解中的插值代码

然而:

s/([w']+)/#@{[++$count]}=$1/g;

产生这个:

#1=This #2=particularly #3=rapid,
    #4=unintelligible #5=patter
#6=isn't #7=generally #8=heard, 
    #9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!

在 LHS 而不是 RHS 中使用代码的解决方案

这将增量放在匹配本身中:

Solution with code in LHS instead of RHS

This puts the incrementation within the match itself:

s/  ( [w']+ )  (?{ $count++ }) /#$count=$1/gx;

产生这个:

#1=This #2=particularly #3=rapid,
    #4=unintelligible #5=patter
#6=isn't #7=generally #8=heard, 
    #9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!

一个口吃的解决方案 口吃解决方案

这个

s{  ( [w'] + )              }
 { join " " => ($1) x ++$count   }gsex;

生成这个令人愉快的答案:

generates this delightful answer:

This particularly particularly rapid rapid rapid,
    unintelligible unintelligible unintelligible unintelligible patter patter patter patter patter
isn't isn't isn't isn't isn't isn't generally generally generally generally generally generally generally heard heard heard heard heard heard heard heard, 
    and and and and and and and and and if if if if if if if if if if it it it it it it it it it it it is is is is is is is is is is is is it it it it it it it it it it it it it doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't matter matter matter matter matter matter matter matter matter matter matter matter matter matter matter!

探索边界

有更强大的词边界方法适用于复数所有格(以前的方法不适用),但我怀疑你的奥秘在于让 ++$count 触发,而不是 行为的微妙之处.

Exploring Boundaries

There are more robust approaches to word boundaries that work for plural possessives (the previous approaches don’t), but I suspect your mystery lies in getting the ++$count to fire, not with the subtleties of  behavior.

真的希望人们明白  不是他们认为的那样.他们总是认为这意味着有空白或字符串的边缘那里.他们从不认为它是 wWWw 转换.

I really wish people understood that  isn’t what they think it is. They always think it means there's white space or the edge of the string there. They never think of it as wW or Ww transitions.

# same as using a  before:
(?(?=w) (?<!w)  | (?<!W) )

# same as using a  after:
(?(?<=w) (?!w)  | (?!W)  )

如您所见,它是有条件的,具体取决于它所接触的内容.这就是 (?(COND)THEN|ELSE) 子句的用途.

As you see, it's conditional depending on what it's touching. That’s what the (?(COND)THEN|ELSE) clause is for.

这会成为以下问题:

$_ = qq('Tis Paul's parents' summer-house, isn't it?
);
my $count = 0;

s{
    (?(?=[-w']) (?<![-w'])  | (?<![^-w']) )
    ( [-w'] + )
    (?(?<=[-w']) (?![-w'])  | (?![^-w'])  )
}{
    sprintf "(%s)[%d]", $1, ++$count
}gsex;

print;

正确打印

('Tis)[1] (Paul's)[2] (parents')[3] (summer-house)[4], (isn't)[5] (it)[6]?

担心 Unicode

1960 年代风格的 ASCII 已经过时了大约 50 年.就像你看到任何人写 [a-z] 一样,它几乎总是错误的,事实证明,像破折号和引号这样的东西也不应该在模式中显示为文字.虽然我们在这里,但您可能不想使用 w,因为它也包括数字和下划线,而不仅仅是字母.

Worrying about Unicode

1960s-style ASCII is about 50 years out of date. Just as whenever you see anyone write [a-z], it’s nearly always wrong, it turns out that things like dashes and quotation marks shouldn’t show up as literals in patterns, either. While we’re at it, you probably don’t want to use w, because that includes numbers and underscores as well, not just alphabetics.

想象一下这个字符串:

$_ = qq(x{2019}Tis Renx{E9}ex{2019}s greatx{2010}grandparentsx{2019} summerx{2010}house, isnx{2019}t it?
);

你可以用 use utf8 作为文字:

which you could have as a literal with use utf8:

use utf8;
$_ = qq(’Tis Renée’s great‐grandparents’ summer‐house, isn’t it?
);

这一次,我将采用稍微不同的模式,将术语的定义与其执行分开,以使其更具可读性和可维护性:

This time I’ll go at the pattern a bit differently, separating out my definition of terms from their execution to try to make it more readable and thence maintainable:

#!/usr/bin/perl -l
use 5.10.0;
use utf8;
use open qw< :std :utf8 >;
use strict;
use warnings qw< FATAL all >;
use autodie;

$_ = q(’Tis Renée’s great‐grandparents’ summer‐house, isn’t it?);

my $count = 0;

s{ (?<WORD> (?&full_word)  )

   # the rest is just definition
   (?(DEFINE)

     (?<word_char>   [p{Alphabetic}p{Quotation_Mark}] )

     (?<full_word>

             # next line won't compile cause
             # fears variable-width lookbehind
             ####  (?<! (?&word_char) )   )
             # so must inline it

         (?<! [p{Alphabetic}p{Quotation_Mark}] )

         (?&word_char)
         (?:
             p{Dash}
           | (?&word_char)
         ) *

         (?!  (?&word_char) )
     )

   )   # end DEFINE declaration block

}{
    sprintf "(%s)[%d]", $+{WORD}, ++$count;
}gsex;

print;

该代码在运行时产生:

(’Tis)[1] (Renée’s)[2] (great‐grandparents’)[3] (summer‐house)[4], (isn’t)[5] (it)[6]?

好的,所以这可能是关于花哨的正则表达式的 FMTEYEWTK,但是你问这个问题不高兴吗?☺

Ok, so that may have beeen FMTEYEWTK about fancy regexes, but aren’t you glad you asked? ☺

这篇关于正则表达式替换中是否有类似计数器变量的东西?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆