在数组中搜索标量时,Perl的smartmatch运算符有多快? [英] How fast is Perl's smartmatch operator when searching for a scalar in an array?

查看:85
本文介绍了在数组中搜索标量时,Perl的smartmatch运算符有多快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在不变的数组中反复搜索值.

I want to repeatedly search for values in an array that does not change.

到目前为止,我一直是这样进行的:我将值放入一个哈希(这样我就拥有一个数组和一个具有基本上相同内容的哈希),然后使用exists搜索该哈希.

So far, I have been doing it this way: I put the values in a hash (so I have an array and a hash with essentially the same contents) and I search the hash using exists.

我不喜欢有两个不同的变量(数组和哈希)都存储相同的东西;但是,散列搜索起来要快得多.

I don't like having two different variables (the array and the hash) that both store the same thing; however, the hash is much faster for searching.

我发现Perl 5.10中有一个~~(智能匹配)运算符.在数组中搜索标量时,效率如何?

I found out that there is a ~~ (smartmatch) operator in Perl 5.10. How efficient is it when searching for a scalar in an array?

推荐答案

如果要在数组中搜索单个标量,则可以使用

If you want to search for a single scalar in an array, you can use List::Util's first subroutine. It stops as soon as it knows the answer. I don't expect this to be faster than a hash lookup if you already have the hash, but when you consider creating the hash and having it in memory, it might be more convenient for you to just search the array you already have.

至于智能匹配运算符的智能,如果您想了解它的智能程度,请对其进行测试. :)

As for the smarts of the smart-match operator, if you want to see how smart it is, test it. :)

您至少要检查三种情况.最坏的情况是您要查找的每个元素都在末尾.最好的情况是,要查找的每个元素都在开头.可能的情况是,您要查找的元素平均位于中间.

There are at least three cases you want to examine. The worst case is that every element you want to find is at the end. The best case is that every element you want to find is at the beginning. The likely case is that the elements you want to find average out to being in the middle.

现在,在我开始进行此基准测试之前,我希望如果智能匹配能够短路(并且可以短路,请参见

Now, before I start this benchmark, I expect that if the smart match can short circuit (and it can; its documented in perlsyn), that the best case times will stay the same despite the array size, while the other ones get increasingly worse. If it can't short circuit and has to scan the entire array every time, there should be no difference in the times because every case involves the same amount of work.

这是一个基准:

#!perl
use 5.12.2;
use strict;
use warnings;

use Benchmark qw(cmpthese);

my @hits = qw(A B C);
my @base = qw(one two three four five six) x ( $ARGV[0] || 1 );

my @at_end       = ( @base, @hits );
my @at_beginning = ( @hits, @base );

my @in_middle = @base;
splice @in_middle, int( @in_middle / 2 ), 0, @hits;

my @random = @base;
foreach my $item ( @hits ) {
    my $index = int rand @random;
    splice @random, $index, 0, $item;
    }

sub count {
    my( $hits, $candidates ) = @_;

    my $count;
    foreach ( @$hits ) { when( $candidates ) { $count++ } }
    $count;
    }

cmpthese(-5, {
    hits_beginning => sub { my $count = count( \@hits, \@at_beginning ) },
    hits_end       => sub { my $count = count( \@hits, \@at_end ) },
    hits_middle    => sub { my $count = count( \@hits, \@in_middle ) },
    hits_random    => sub { my $count = count( \@hits, \@random ) },
    control        => sub { my $count = count( [], [] ) },
  }
);

以下是各个部分的工作方式.请注意,这是两个轴上的对数图,因此插入线的斜率并不像它们看起来的那么紧密:

Here's how the various parts did. Note that this is a logarithmic plot on both axes, so the slopes of the plunging lines aren't as close as they look:

因此,看起来好像智能匹配运算符有点聪明,但这并没有真正帮助您,因为您可能仍然需要扫描整个阵列.您可能事先不知道在哪里可以找到您的元素.我希望即使您必须为此放弃一些内存,哈希也将执行与最佳情况智能匹配相同的操作.

So, it looks like the smart match operator is a bit smart, but that doesn't really help you because you still might have to scan the entire array. You probably don't know ahead of time where you'll find your elements. I expect a hash will perform the same as the best case smart match, even if you have to give up some memory for it.

好的,因此,智能匹配是智能乘以2的时间,但是真正的问题是我应该使用它吗?".另一种选择是哈希查找,这一直困扰着我,我没有考虑过这种情况.

Okay, so the smart match being smart times two is great, but the real question is "Should I use it?". The alternative is a hash lookup, and it's been bugging me that I haven't considered that case.

与任何基准测试一样,在开始实际测试结果之前,我会先思考结果可能是什么.我希望,如果我已经有了哈希,那么查找一个值将会很快.那件事不是问题.我对还没有哈希的情况更感兴趣.我多快可以使哈希和查找成为键?我希望它的表现不是很好,但是它仍然比最差情况的智能比赛还好吗?

As with any benchmark, I start off thinking about what the results might be before I actually test them. I expect that if I already have the hash, looking up a value is going to be lightning fast. That case isn't a problem. I'm more interested in the case where I don't have the hash yet. How quickly can I make the hash and lookup a key? I expect that to perform not so well, but is it still better than the worst case smart match?

不过,在查看基准之前,请记住,几乎没有足够的信息说明仅通过查看数字就应使用哪种技术.问题的上下文选择了最佳技术,而不是最快的,无上下文的微基准测试.考虑几种情况,这些情况会选择不同的技术:

Before you see the benchmark, though, remember that there's almost never enough information about which technique you should use just by looking at the numbers. The context of the problem selects the best technique, not the fastest, contextless micro-benchmark. Consider a couple of cases that would select different techniques:

  • 您有一个要重复搜索的数组
  • 您总是得到一个只需搜索一次的新数组
  • 您得到的阵列非常大,但内存有限

现在,请记住这些,我将添加到以前的程序中:

Now, keeping those in mind, I add to my previous program:

my %old_hash = map {$_,1} @in_middle; 

cmpthese(-5, {
    ...,
    new_hash       => sub { 
        my %h = map {$_,1} @in_middle; 
        my $count = 0;
        foreach ( @hits ) { $count++ if exists $h{$_} }
        $count;
        },
    old_hash       => sub { 
        my $count = 0;
        foreach ( @hits ) { $count++ if exists $old_hash{$_} }
        $count;
        },
    control_hash   => sub { 
        my $count = 0;
        foreach ( @hits ) { $count++ }
        $count;
        },
    }
);

这是情节.颜色有点难以区分.最下一行是您必须在每次要搜索哈希值时创建哈希值的情况.真是可怜.最高的两行(绿色)是散列(实际上没有散列)和现有散列查找的控件.这是一个对数/对数图;这两种情况甚至比智能匹配控件(仅调用子例程)还要快.

Here's the plot. The colors are a bit difficult to distinguish. The lowest line there is the case where you have to create the hash any time you want to search it. That's pretty poor. The highest two (green) lines are the control for the hash (no hash actually there) and the existing hash lookup. This is a log/log plot; those two cases are faster than even the smart match control (which just calls a subroutine).

还有其他一些注意事项. 随机"情况的行略有不同.这是可以理解的,因为每个基准测试(因此,每运行一次数组缩放一次)都会将命中元素随机放置在候选数组中.有些运行将它们放得早一些,有些则要稍晚一些,但是由于我每次整个程序运行都只创建一次@random数组,因此它们会移动一些.这意味着生产线上的颠簸并不重要.如果我尝试所有位置并求平均,我希望随机"行与中间"行相同.

There are a few other things to note. The lines for the "random" case are a bit different. That's understandable because each benchmark (so, once per array scale run) randomly places the hit elements in the candidate array. Some runs put them a bit earlier and some a bit later, but since I only make the @random array once per run of the entire program, they move around a bit. That means that the bumps in the line aren't significant. If I tried all positions and averaged, I expect that "random" line to be the same as the "middle" line.

现在,看看这些结果,我想说的是,智能匹配在最坏的情况下比哈希查找在最坏的情况下快得多.那讲得通.要创建哈希,我必须访问数组的每个元素,并创建哈希,这需要大量复制.智能匹配无法复制.

Now, looking at these results, I'd say that a smart-match is much faster in its worst case than the hash lookup is in its worst case. That makes sense. To create a hash, I have to visit every element of the array and also make the hash, which is a lot of copying. There's no copying with the smart match.

这是另外一个案例,尽管我不会进行检查.哈希何时变得比智能匹配更好?也就是说,何时创建散列的开销在重复搜索中足够分散以至于散列是更好的选择?

Here's a further case I won't examine though. When does the hash become better than the smart match? That is, when does the overhead of creating the hash spread out enough over repeated searches that the hash is the better choice?

这篇关于在数组中搜索标量时,Perl的smartmatch运算符有多快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆