正则表达式替换中是否存在类似计数器变量的内容? [英] Is there something like a counter variable in regular expression replace?

查看:51
本文介绍了正则表达式替换中是否存在类似计数器变量的内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有很多匹配项,例如在多行模式下,我想用部分匹配项以及递增的计数器号替换它们.

If I have a lot of matches, for example in multi line mode, and I want to replace them with part of the match as well as a counter number that increments.

我想知道正则表达式是否有这样的变量.我找不到一个,但我似乎记得类似的东西存在……

I was wondering if any regex flavor has such a variable. I couldn't find one, but I seem to remember something like that exists...

我不是在谈论脚本语言,您可以在其中使用回调进行替换.这是关于能够在RegexBuddy,sublime文本,gskinner.com/RegExr等工具中执行此操作的方式……与您可以使用\ 1或$ 1引用捕获的子字符串的方式大致相同.

I'm not talking about scripting languages in which you can use callbacks for replacement. It's about being able to do this in tools like RegexBuddy, sublime text, gskinner.com/RegExr, ... much in the same way you can refer to captured substrings with \1 or $1.

推荐答案

FMTEYEWTK关于花哨的正则表达式

好的,我将从简单过渡到崇高.享受吧!

FMTEYEWTK about Fancy Regexes

Ok, I’m going to go from the simple to the sublime. Enjoy!

为此:

#!/usr/bin/perl

$_ = <<"End_of_G&S";
    This particularly rapid,
        unintelligible patter
    isn't generally heard,
        and if it is it doesn't matter!
End_of_G&S

my $count = 0;

然后这样:

s{
    \b ( [\w']+ ) \b
}{
    sprintf "(%s)[%d]", $1, ++$count;
}gsex;

产生了这个

(This)[1] (particularly)[2] (rapid)[3],
    (unintelligible)[4] (patter)[5]
(isn't)[6] (generally)[7] (heard)[8], 
    (and)[9] (if)[10] (it)[11] (is)[12] (it)[13] (doesn't)[14] (matter)[15]!

Anon数组解决方案中的内插代码

鉴于此:

Interpolated Code in Anon Array Solution

Whereas this:

s/\b([\w']+)\b/#@{[++$count]}=$1/g;

产生此:

#1=This #2=particularly #3=rapid,
    #4=unintelligible #5=patter
#6=isn't #7=generally #8=heard, 
    #9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!

使用LHS中的代码而不是RHS的解决方案

这会将增量放入比赛本身内:

Solution with code in LHS instead of RHS

This puts the incrementation within the match itself:

s/ \b ( [\w']+ ) \b (?{ $count++ }) /#$count=$1/gx;

得出这样的结果:

#1=This #2=particularly #3=rapid,
    #4=unintelligible #5=patter
#6=isn't #7=generally #8=heard, 
    #9=and #10=if #11=it #12=is #13=it #14=doesn't #15=matter!

口吃解决方案口吃解决方案

s{ \b ( [\w'] + ) \b             }
 { join " " => ($1) x ++$count   }gsex;

产生一个令人愉快的答案:

generates this delightful answer:

This particularly particularly rapid rapid rapid,
    unintelligible unintelligible unintelligible unintelligible patter patter patter patter patter
isn't isn't isn't isn't isn't isn't generally generally generally generally generally generally generally heard heard heard heard heard heard heard heard, 
    and and and and and and and and and if if if if if if if if if if it it it it it it it it it it it is is is is is is is is is is is is it it it it it it it it it it it it it doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't doesn't matter matter matter matter matter matter matter matter matter matter matter matter matter matter matter!

探索边界

有更多健壮的方法来解决单词边界问题,而这些方法不适用于复数所有格(以前的方法则没有),但是我怀疑您的奥秘在于++$count被触发,而不是\b行为的精妙之处.

Exploring Boundaries

There are more robust approaches to word boundaries that work for plural possessives (the previous approaches don’t), but I suspect your mystery lies in getting the ++$count to fire, not with the subtleties of \b behavior.

真的希望人们理解\b不是他们认为的那样. 他们总是认为这意味着空格或字符串的边缘 那里.他们从不认为它是\w\W\W\w过渡.

I really wish people understood that \b isn’t what they think it is. They always think it means there's white space or the edge of the string there. They never think of it as \w\W or \W\w transitions.

# same as using a \b before:
(?(?=\w) (?<!\w)  | (?<!\W) )

# same as using a \b after:
(?(?<=\w) (?!\w)  | (?!\W)  )

如您所见,它是有条件的,取决于它所触摸的内容.这就是(?(COND)THEN|ELSE)子句的作用.

As you see, it's conditional depending on what it's touching. That’s what the (?(COND)THEN|ELSE) clause is for.

这成为诸如以下问题的问题:

This becomes an issue with things like:

$_ = qq('Tis Paul's parents' summer-house, isn't it?\n);
my $count = 0;

s{
    (?(?=[\-\w']) (?<![\-\w'])  | (?<![^\-\w']) )
    ( [\-\w'] + )
    (?(?<=[\-\w']) (?![\-\w'])  | (?![^\-\w'])  )
}{
    sprintf "(%s)[%d]", $1, ++$count
}gsex;

print;

可以正确打印

('Tis)[1] (Paul's)[2] (parents')[3] (summer-house)[4], (isn't)[5] (it)[6]?

担心Unicode

1960年代风格的ASCII已过时约50年.就像您看到有人写[a-z]一样,这几乎总是错误的,事实证明,破折号和引号之类的内容也不应该显示为模式中的文字.当我们使用它时,您可能不想使用\w,因为它还包括数字和下划线,而不仅仅是字母.

Worrying about Unicode

1960s-style ASCII is about 50 years out of date. Just as whenever you see anyone write [a-z], it’s nearly always wrong, it turns out that things like dashes and quotation marks shouldn’t show up as literals in patterns, either. While we’re at it, you probably don’t want to use \w, because that includes numbers and underscores as well, not just alphabetics.

想象一下这个字符串:

$_ = qq(\x{2019}Tis Ren\x{E9}e\x{2019}s great\x{2010}grandparents\x{2019} summer\x{2010}house, isn\x{2019}t it?\n);

,您可以使用use utf8作为文字:

which you could have as a literal with use utf8:

use utf8;
$_ = qq(’Tis Renée’s great‐grandparents’ summer‐house, isn’t it?\n);

这次,我将在模式上有所不同,将术语的定义与执行分开,以使其更具可读性并因此可维护:

This time I’ll go at the pattern a bit differently, separating out my definition of terms from their execution to try to make it more readable and thence maintainable:

#!/usr/bin/perl -l
use 5.10.0;
use utf8;
use open qw< :std :utf8 >;
use strict;
use warnings qw< FATAL all >;
use autodie;

$_ = q(’Tis Renée’s great‐grandparents’ summer‐house, isn’t it?);

my $count = 0;

s{ (?<WORD> (?&full_word)  )

   # the rest is just definition
   (?(DEFINE)

     (?<word_char>   [\p{Alphabetic}\p{Quotation_Mark}] )

     (?<full_word>

             # next line won't compile cause
             # fears variable-width lookbehind
             ####  (?<! (?&word_char) )   )
             # so must inline it

         (?<! [\p{Alphabetic}\p{Quotation_Mark}] )

         (?&word_char)
         (?:
             \p{Dash}
           | (?&word_char)
         ) *

         (?!  (?&word_char) )
     )

   )   # end DEFINE declaration block

}{
    sprintf "(%s)[%d]", $+{WORD}, ++$count;
}gsex;

print;

该代码在运行时会产生以下结果:

That code when run produces this:

(’Tis)[1] (Renée’s)[2] (great‐grandparents’)[3] (summer‐house)[4], (isn’t)[5] (it)[6]?

好,所以也许 FMTEYEWTK有关花哨的正则表达式,但是您是否不高兴被问到呢? ☺

Ok, so that may have beeen FMTEYEWTK about fancy regexes, but aren’t you glad you asked? ☺

这篇关于正则表达式替换中是否存在类似计数器变量的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆