如何从Perl数组中随机获取n个元素? [英] How can I take n elements at random from a Perl array?

查看:832
本文介绍了如何从Perl数组中随机获取n个元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数组,A = [a1,a2,a3,...aP],大小为P.我必须对数组A中的q元素进行采样.

I have an array, A = [a1,a2,a3,...aP] with size P. I have to sample q elements from array A.

我计划使用具有q个迭代的循环,并在每次迭代中从A中随机选择一个元素.但是,如何确定每次选择的采摘编号会有所不同?

I plan to use a loop with q iterations, and randomly pick a elment from A at each iteration. But how can I make sure that the picked number will be different at each iteration?

推荐答案

其他答案都涉及改组数组,即O(n). 这意味着修改原始数组(破坏性)或复制原始数组(内存密集型).

The other answers all involve shuffling the array, which is O(n). It means modifying the original array (destructive) or copying the original array (memory intensive).

提高内存效率的第一种方法不是改组原始数组,而是改组索引数组.

The first way to make it more memory efficient is not to shuffle the original array but to shuffle an array of indexes.

# Shuffled list of indexes into @deck
my @shuffled_indexes = shuffle(0..$#deck);

# Get just N of them.
my @pick_indexes = @shuffled_indexes[ 0 .. $num_picks - 1 ];  

# Pick cards from @deck
my @picks = @deck[ @pick_indexes ];

它至少与@deck的内容无关,但仍具有O(nlogn)性能和O(n)内存.

It is at least independent of the content of the @deck, but its still O(nlogn) performance and O(n) memory.

一种更有效的算法(不一定更快,取决于您的数组的大小)是查看数组的每个元素,并确定是否将其放入数组.这类似于如何您可以从文件中选择一条随机行,而无需将整个文件读入内存中,每行被选中的几率为1/N,其中N为行号.因此,第一行的机会是1/1(通常会被选中).下一个有一个1/2.然后1/3,依此类推.每个选择都会覆盖上一个选择.这样每行就有1/total_lines个机会.

A more efficient algorithm (not necessarily faster, depends on now big your array is) is to look at each element of the array and decide if it's going to make it into the array. This is similar to how you select a random line from a file without reading the whole file into memory, each line has a 1/N chance of being picked where N is the line number. So the first line has a 1/1 chance (it's always picked). The next has a 1/2. Then 1/3 and so on. Each pick will overwrite the previous pick. This results in each line having a 1/total_lines chance.

您可以自己解决.一个单行文件的机会为1/1,因此始终会选择第一个文件.两行文件...第一行有1/1/1/2的存活机会,即1/2,第二行有1/2/1/2的机会.对于三行文件...第一行有1/1的几率被选中,然后有1/2 * 2/3的生存几率是2/6或1/3.依此类推.

You can work it out for yourself. A one line file has a 1/1 chance so the first one is always picked. A two line file... the first line has a 1/1 then a 1/2 chance of surviving, which is 1/2, and the second line has a 1/2 chance. For a three line file... the first line has a 1/1 chance of being picked, then a 1/2 * 2/3 chance of surviving which is 2/6 or 1/3. And so on.

该算法的速度为O(n),它一次遍历无序数组,并且不消耗过多的内存来存储选择.

The algorithm is O(n) for speed, it iterates through an unordered array once, and does not consume any more memory than is needed to store the picks.

稍作修改,该方法适用于多个选择.代替1/$position的机会是$picks_left / $position.每次选择成功时,您都会递减$ picks_left.您的工作从高位到低位.与以前不同,您不会覆盖.

With a little modification, this works for multiple picks. Instead of a 1/$position chance, it's $picks_left / $position. Each time a pick is successful, you decrement $picks_left. You work from the high position to the low one. Unlike before, you don't overwrite.

my $picks_left = $picks;
my $num_left = @$deck;
my @picks;
my $idx = 0;
while($picks_left > 0 ) {  # when we have all our picks, stop
    # random number from 0..$num_left-1
    my $rand = int(rand($num_left));

    # pick successful
    if( $rand < $picks_left ) {
        push @result, $deck->[$idx];
        $picks_left--;
    }

    $num_left--;
    $idx++;
}

这是 perl5i如何实现其pick方法(接下来释放).

要直观地理解为什么这样做,请以从4元素列表中选择2为例.每个人都有被拣选的机会为1/2.

To understand viscerally why this works, take the example of picking 2 from a 4 element list. Each should have a 1/2 chance of being picked.

1. (2 picks, 4 items):         2/4 = 1/2

足够简单.下一个元素有1/2的几率表示一个元素已经被拾取,在这种情况下,它的几率是1/3.否则,它的机会是2/3.做数学...

Simple enough. Next element has a 1/2 chance that an element will already have been picked, in which case it's chances are 1/3. Otherwise its chances are 2/3. Doing the math...

2. (1 or 2 picks,  3 items):   (1/3 * 1/2) + (2/3 * 1/2) = 3/6 = 1/2

接下来有1/4的机会会同时选择两个元素(1/2 * 1/2),然后就没有机会了; 1/2的机会只有一个被选中,那么它就有1/2的机会;剩下的1/4表示没有物品被拣选,在这种情况下为2/2.

Next has a 1/4 chance that both elements will already be picked (1/2 * 1/2), then it has no chance; 1/2 chance that only one will be picked, then it has 1/2; and the remaining 1/4 that no items will be picked in which case it's 2/2.

3. (0, 1 or 2 picks, 2 items): (0/2 * 1/4) + (1/2 * 2/4) + (2/2 * 1/4) = 2/8 + 1/4 = 1/2

最后,对于最后一项,前一项是最后一项选择,为1/2.

Finally, for the last item, there's a 1/2 the previous took the last pick.

4. (0 or 1 pick, 1 items):     (0/1 * 2/4) + (1/1 * 2/4) = 1/2

并非完全是证据,但对于说服自己很有用.

Not exactly a proof, but good for convincing yourself it works.

这篇关于如何从Perl数组中随机获取n个元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆