Perl:如何在正则表达式中使用字符串变量作为搜索模式和替换 [英] Perl: how to use string variables as search pattern and replacement in regex

查看:45
本文介绍了Perl:如何在正则表达式中使用字符串变量作为搜索模式和替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在正则表达式中将字符串变量用于搜索模式和替换.预期的输出是这样的,

I want to use string variables for both search pattern and replacement in regex. The expected output is like this,

$ perl -e '$a="abcdeabCde"; $a=~s/b(.)d/_$1$1_/g; print "$a\n"'
a_cc_ea_CC_e

但是当我将模式和替换移动到一个变量时,$1 没有被评估.

But when I moved the pattern and replacement to a variable, $1 was not evaluated.

$ perl -e '$a="abcdeabCde"; $p="b(.)d"; $r="_\$1\$1_"; $a=~s/$p/$r/g; print "$a\n"'
a_$1$1_ea_$1$1_e

当我使用ee"修饰符时,它会出错.

When I use "ee" modifier, it gives errors.

$ perl -e '$a="abcdeabCde"; $p="b(.)d"; $r="_\$1\$1_"; $a=~s/$p/$r/gee; print "$a\n"'
Scalar found where operator expected at (eval 1) line 1, near "$1$1"
    (Missing operator before $1?)
Bareword found where operator expected at (eval 1) line 1, near "$1_"
    (Missing operator before _?)
Scalar found where operator expected at (eval 2) line 1, near "$1$1"
    (Missing operator before $1?)
Bareword found where operator expected at (eval 2) line 1, near "$1_"
    (Missing operator before _?)
aeae

我在这里想念什么?

$p$r 都是我自己写的.我需要的是在不触及 perl 代码的情况下进行多个类似的正则表达式替换,因此 $p$r 必须位于单独的数据文件中.我希望这个文件可以在以后与 C++/python 代码一起使用.以下是 $p$r 的一些示例.

Both $p and $r are written by myself. What I need is to do multiple similar regex replacing without touching the perl code, so $p and $r have to be in a separate data file. I hope this file can be used with C++/python code later. Here are some examples of $p and $r.

^(.*\D)?((19|18|20)\d\d)年   $1$2<digits>年
^(.*\D)?(0\d)年  $1$2<digits>年
([TKZGD])(\d+)/(\d+)([^\d/])    $1$2<digits>$3<digits>$4
([^/TKZGD\d])(\d+)/(\d+)([^/\d])    $1$3分之$2$4

推荐答案

使用 $p="b(.)d"; 你会得到一个包含文字字符 b(.)d.通常,正则表达式模式不会保留在带引号的字符串中,并且在正则表达式中可能没有它们的预期含义.但是,请参阅最后的注意.

With $p="b(.)d"; you are getting a string with literal characters b(.)d. In general, regex patterns are not preserved in quoted strings and may not have their expected meaning in a regex. However, see Note at the end.

这就是 qr operator 的用途:$p = qr/b(.)d/; 将字符串形成为正则表达式.

This is what qr operator is for: $p = qr/b(.)d/; forms the string as a regular expression.

至于替换部分和/ee,问题是先评估$r,得到_$1$1_,然后将其评估为代码.唉,这不是有效的 Perl 代码._ 是裸词,甚至 $1$1 本身也是无效的(例如, $1 . $1 将是).

As for the replacement part and /ee, the problem is that $r is first evaluated, to yield _$1$1_, which is then evaluated as code. Alas, that is not valid Perl code. The _ are barewords and even $1$1 itself isn't valid (for example, $1 . $1 would be).

所提供的 $r 示例以各种方式将 $N 与文本混合在一起.解析它的一种方法是将所有 $N 和所有其他内容提取到一个列表中,该列表维护它们从字符串中的顺序.然后,可以将其处理为有效代码的字符串.例如,我们需要

The provided examples of $r have $Ns mixed with text in various ways. One way to parse this is to extract all $N and all else into a list that maintains their order from the string. Then, that can be processed into a string that will be valid code. For example, we need

'$1_$2$3other'  -->  $1 . '_' . $2 . $3 . 'other'

这是可以评估的有效 Perl 代码.

which is valid Perl code that can be evaluated.

分解的部分是由 split 在分隔符模式.

The part of breaking this up is helped by split's capturing in the separator pattern.

sub repl {
    my ($r) = @_;

    my @terms = grep { $_ } split /(\$\d)/, $r;

    return join '.', map { /^\$/ ? $_ : q(') . $_ . q(') } @terms;
}

$var =~ s/$p/repl($r)/gee;

split 的模式中捕获 /(...)/ 后,分隔符作为列表的一部分返回.因此,这从 $r 中提取了一个术语数组,它们是 $N 或其他,按其原始顺序并保留所有内容(尾随空格除外).这包括可能的(前导)空字符串,因此需要将其过滤掉.

With capturing /(...)/ in split's pattern, the separators are returned as a part of the list. Thus this extracts from $r an array of terms which are either $N or other, in their original order and with everything (other than trailing whitespace) kept. This includes possible (leading) empty strings so those need be filtered out.

那么除了$Ns之外的每一项都被包裹在'...'中,所以当它们都被连接起来时.我们获取有效的 Perl 表达式,如上例所示.

Then every term other than $Ns is wrapped in '...', so when they are all joined by . we get a valid Perl expression, as in the example above.

然后 /ee 会让这个函数返回字符串(如上),并将其评估为有效代码.

Then /ee will have this function return the string (such as above), and evaluate it as valid code.

我们被告知在外部输入上使用 /ee 的安全性不是问题.尽管如此,这还是要记住的.请参阅这篇博文,由Håkon Hægland 在评论中.随着讨论,它还引导我们到 字符串::替换.这篇文章中演示了它的使用.解决此问题的另一种方法是使用 replace"nofollow noreferrer">Data::Munge

We are told that safety of using /ee on external input is not a concern here. Still, this is something to keep in mind. See this post, provided by Håkon Hægland in a comment. Along with the discussion it also directs us to String::Substitution. Its use is demonstrated in this post. Another way to approach this is with replace from Data::Munge

有关 /ee 的更多讨论,请参阅 这篇博文,有几个有用的答案.

For more discussion of /ee see this post, with several useful answers.

使用 "b(.)d" 作为正则表达式的注意事项

Note on using "b(.)d" for a regex pattern

在这种情况下,对于括号和点,它们的特殊含义得以保留.感谢 kangshiyin 提前提到这一点,并感谢 Håkon Hægland 用于断言.然而,这是一种特殊情况.双引号字符串直接拒绝了许多模式,因为已经完成了插值——例如,"\w" 只是一个转义的 w(无法识别的). 引号应该可以工作,因为没有插值.尽管如此,用作正则表达式模式的字符串最好使用 qr 形成,因为我们得到了一个真正的正则表达式.然后也可以使用所有修饰符.

In this case, with parens and dot, their special meaning is maintained. Thanks to kangshiyin for an early mention of this, and to Håkon Hægland for asserting it. However, this is a special case. Double-quoted strings directly deny many patterns since interpolation is done -- for example, "\w" is just an escaped w (what is unrecognized). The single quotes should work, as there is no interpolation. Still, strings intended for use as regex patterns are best formed using qr, as we are getting a true regex. Then all modifiers may be used as well.

这篇关于Perl:如何在正则表达式中使用字符串变量作为搜索模式和替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆