是 $1 还是 $&在 Perl 中使用 s///替换匹配的字符串更快? [英] Is $1 or $& faster for replacing a matched string using s/// in Perl?

查看:33
本文介绍了是 $1 还是 $&在 Perl 中使用 s///替换匹配的字符串更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

其中哪一种更便宜?

$_ = 'abc123def';

s/\d+/$&*2/e;
say;

s/(\d+)/$1*2/e;
say;

推荐答案

执行摘要:改用 5.010 的/p.$& 的性能是单个匹配或替换大致相同,但整个程序可以忍受它.放缓是长期的,而不是局部的.

Executive summary: use 5.010's /p instead. The performance of $& is about the same for a single match or substitution, but the entire program can suffer from it. It's slowdown is long-range, not local.

这是 5.010 的基准测试,我怀疑您正在使用它,因为您在其中使用了 say.请注意,5.010 有一个新的 /p 标志,它提供一个 ${^MATCH} 变量,其作用类似于 $& 但仅用于一个实例匹配或替换运算符.

Here's a benchmark with 5.010, which I suspect you are using since you have say in there. Note that 5.010 has a new /p flag that supplies a ${^MATCH} variable that acts like $& but for only one instance of the match or substitution operator.

与任何基准测试一样,我与设置基线的控件进行比较,这样我就知道无聊的部分占用了多少时间.此外,这个基准测试有一个陷阱:你不能在代码中使用 $& 否则每个替换都会受到影响.首先运行没有 $& 子的基准测试:

As with any benchmark, I compare with a control to set the baseline so I know how much time the boring bits take up. Also, this benchmark has a trap: you can't use $& in the code or every substitution suffers. First run the benchmark without the $& sub:

use 5.010;

use Benchmark qw(cmpthese);

cmpthese(1_000_000, {
   'control' => sub { my $_ = 'abc123def'; s/\d+/246/ },
   'control-e' => sub { my $_ = 'abc123def'; s/\d+/123*2/e;  },
   '/p'      => sub { my $_ = 'abc123def'; s/\d+/${^MATCH}*2/pe },
   # '$&'      => sub { my $_ = 'abc123def'; s/\d+/$&*2/e },
   '()'      => sub { my $_ = 'abc123def'; s/(\d+)/$1*2/e },
});

在运行 Leopard 和香草 Perl 5.10 的 MacBook Air 上:

On my MacBook Air running Leopard and a vanilla Perl 5.10:

              Rate        /p        () control-e   control
/p         70621/s        --       -1%      -58%      -78%
()         71124/s        1%        --      -58%      -78%
control-e 168350/s      138%      137%        --      -48%
control   322581/s      357%      354%       92%        --

请注意 /e 选项的大幅放缓,我添加该选项只是为了逗笑.

Notice the big slowdown with the /e option, which I've added just for giggles.

现在,我将取消注释 $& 分支,我看到一切都变慢了,尽管 /p 似乎是 shihe这里:

Now, I'll uncomment the $& branch, and I see that everything is slower, although /p seems to shihe here:

              Rate        ()        $&        /p control-e   control
()         68353/s        --       -4%       -7%      -58%      -74%
$&         70872/s        4%        --       -3%      -56%      -73%
/p         73421/s        7%        4%        --      -54%      -72%
control-e 161290/s      136%      128%      120%        --      -39%
control   262467/s      284%      270%      257%       63%        --

这是一个奇怪的基准.如果我不包括 control-e 子,情况看起来会有所不同,这说明了基准测试的另一个概念:它不是绝对的,您所做的一切都对最终结果很重要.在这次运行中,$& 看起来稍微快一点:

This is an odd benchmark. If I don't include the control-e sub, the situation looks different, which demonstrates another concept of benchmarking: it's not absolute and everything that you do matters in the final results. In this run, $& looks slightly faster:

            Rate      ()      /p      $& control
()       69686/s      --     -3%     -3%    -72%
/p       72098/s      3%      --     -0%    -71%
$&       72150/s      4%      0%      --    -71%
control 251256/s    261%    248%    248%      --

所以,我再次使用 control-e 运行它,结果有点移动:

So, I ran it with control-e again, and the results move around a little:

              Rate        ()        /p        $& control-e   control
()         68306/s        --       -3%       -4%      -55%      -74%
/p         70175/s        3%        --       -1%      -54%      -73%
$&         71023/s        4%        1%        --      -53%      -73%
control-e 151976/s      122%      117%      114%        --      -41%
control   258398/s      278%      268%      264%       70%        --

每个的速度差异也不令人印象深刻.任何低于 7% 的值都不是那么重要,因为这种差异是通过重复调用 sub 导致的错误累积(有时通过将相同的代码与自身进行基准测试来尝试).您看到的细微差异仅来自基准测试基础架构.有了这些数字,每种技术的速度几乎相同.您不能只运行一次基准测试.您必须多次运行它才能查看是否获得可重复的结果.

The speed differences in each aren't impressive either. Anything under about 7% isn't that significant since that difference comes the accumulation of errors through the repeated calls to the sub (try it sometime by benchmarking the same code against itself). The slight differences you see come merely from the benchmarking infrastructure. With these numbers, each technique is virtually the same speedwise. You can't just run your benchmark once. You have to run it several times to see if you get repeatable results.

请注意,虽然 /p 看起来稍微慢一些,但它也更慢,因为 $& 欺骗了所有人.还要注意控制速度变慢.这是基准测试如此危险的原因之一.如果您不认真思考结果错误的原因,您很容易被结果误导(请参阅 精通 Perl,在那里我用了一整章来讨论这个.)

Note that although the /p looks very slightly slower, it's also slower because $& cheats by messing up everyone. Notice the slow down in the control too. This is one of the reasons that benchmarking is so dangerous. You can easily mislead yourself with the results if you don't think hard about why they are wrong (see the full screed in Mastering Perl, where I devote an entire chapter to this.)

不过,这个简单而幼稚的基准测试排除了 $& 的致命缺陷.让我们修改基准以处理额外的匹配.首先,没有 $& 效果的基线,我构建了一种情况,其中 $& 必须在附加匹配运算符中复制大约 1,000 个字符:

This simple and naïve benchmark excludes the killer disfeature of $&, though. Let's modify the benchmark to handle an additional match. First, the baseline with no $& effects, where I've constructed a situation where $& would have to copy about 1,000 characters in an additional match operator:

use 5.010;

use Benchmark qw(cmpthese);

$main::long = ( 'a' x 1_000 ) . '123' . ( 'b' x 1_000 );

cmpthese(1_000_000, {
   'control' => sub { my $_ = 'abc123def'; s/\d+/246/; $main::long =~ m/^a+123/; },
   'control-e' => sub { my $_ = 'abc123def'; s/\d+/123*2/e; $main::long =~ m/^a+123/; },
   '/p'      => sub { my $_ = 'abc123def'; s/\d+/${^MATCH}*2/pe; $main::long =~ m/^a+123/; },
   #'$&'      => sub { my $_ = 'abc123def'; s/\d+/$&*2/e; $main::long =~ m/^a+123/;},
   '()'      => sub { my $_ = 'abc123def'; s/(\d+)/$1*2/e; $main::long =~ m/^a+123/; },
});

一切都比以前慢了很多,但是当你做更多的工作时就会发生这种情况,而且这两种技术再次相互干扰:

Everything is much slower than before, but that's what happens when you do more work, and again the two techniques are within each other's noise:

              Rate        ()        /p control-e   control
()         52826/s        --       -4%      -49%      -63%
/p         54885/s        4%        --      -47%      -61%
control-e 103734/s       96%       89%        --      -27%
control   141243/s      167%      157%       36%        --

现在,我取消注释 $& 子:

Now, I uncomment the $& sub:

              Rate        ()        $&        /p control-e   control
()         50607/s        --       -1%       -3%      -43%      -59%
$&         50968/s        1%        --       -2%      -43%      -58%
/p         52274/s        3%        3%        --      -41%      -57%
control-e  89206/s       76%       75%       71%        --      -27%
control   122100/s      141%      140%      134%       37%        --

这个结果很有趣.现在 /p 仍然受到作弊 $& 的惩罚,速度稍微快一些(尽管仍然在噪音范围内),尽管每个人都受到了很大的影响.

That result is very interesting. Now /p, still penalized by the cheating $&, is slightly faster (although still within the noise), although everyone suffers significantly.

同样,对这些结果要非常小心.这并不意味着对于每个脚本, $& 都会有相同的效果.根据匹配的数量、特定的正则表达式等,您可能看起来不那么慢,或者更多.这个或任何基准测试显示的是一个想法,而不是一个决定.您仍然需要弄清楚这个想法如何影响您的特定情况.

Again, be very careful with these results. This does not mean that for every script, $& will have the same effect. You might seem less of a slowdown, or more of it, depending on the number of matches, the particular regexes, and so on. What this, or any, benchmark shows is an idea, not a decision. You still have to figure out how this idea affects your particular situation.

这篇关于是 $1 还是 $&在 Perl 中使用 s///替换匹配的字符串更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆