编程珍珠 - 随机选择算法 [英] Programing Pearls - Random Select algorithm

查看:92
本文介绍了编程珍珠 - 随机选择算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

第120页编程珠玑的第1版presents这种算法选择M个等概率的随机元素出N个整数的人口。

Page 120 of Programming Pearls 1st edition presents this algorithm for selecting M equally probable random elements out of a population of N integers.

InitToEmpty
Size := 0
While Size < M do
  T := RandInt(1,N)
  if not Member(T)    
     Insert(T)
     Size := Size + 1

据说,成员测试预期数小于2M,只要M&其中; N / 2。

It is stated that the expected number of Member tests is less than 2M, as long as M < N/2.

我想知道如何来证明这一点,但我的算法分析的背景是失败了我。

I'd like to know how to prove it, but my algorithm analysis background is failing me.

据我所知,接近m为N,则该程序不再需要,因为结果集将有更多的元素和RandInt的可能性选择现有的人会成比例增加。

I understand that the closer M is to N, the longer the program will take, because the result set will have more elements and the likelihood of RandInt selecting an existing one will increase proportionally.

您可以帮我找出这个证明?

Can you help me figuring out this proof?

推荐答案

我不是一个数学精灵,但我会给它一个粗略的镜头。这是不能保证,虽然是正确的。

I am not a math wizard, but I will give it a rough shot. This is NOT guaranteed to be right though.

有关的M每增加一个成员,你挑一个数字,看看它的存在,如果是添加它。否则,你再试一次。尝试一些,直到你成功了被称为几何概率分布。

For each additional member of M, you pick a number, see if it's there, and if is add it. Otherwise, you try again. Trying something until you're successful is called a geometric probability distribution.

http://en.wikipedia.org/wiki/Geometric_distribution

所以,你正在运行中号的几何试验。各试验已预期值1 / P,所以将采取预期的1 /对试图得到一个号码不是已经在熔点:是N减去我们已经从M个中加入用N(即多少取消拾取物品分割号码的数目/共8条)。因此对于第四数目,p值=(N -3)/ N,它是采摘一个未使用的号码的概率,所以风镐对第三数目的期望的数量为N / N-3。

So you are running M geometric trials. Each trial has expected value 1/p, so will take expected 1/p tries to get a number not already in M. p is N minus the number of numbers we've already added from M divided by N (i.e. how many unpicked items / total items). So for the fourth number, p = (N -3) / N, which is the probability of picking an unused number, so the expected number of picks for the third number is N / N-3 .

的运行时间的预期值是所有这些加在一起。因此,像

The expected value of the run time is all of these added together. So something like

E(运行时间)= N / N + N /(N-1)+ N /(N-2)... + N /(NM)

E(run time) = N/N + N/(N -1) + N/(N -2 ) ... + N/ (N-M)

现在如果M&LT; N / 2,然后在求和的最后一个元素是上面由2界((N / N / 2)== 2))。这也是明显的,在整个求和的最大元素。因此,如果最大的元件是二推介,并有M个元件被求和,在整个运行时间在EV高于由2M界

Now if M < N/2, then the last element in that summation is bounded above by 2. ((N/N/2) == 2)). It's also obviously the largest element in the whole summation. So if the biggest element is two picks, and there are M elements being summed, the EV of the whole run time is bounded above by 2M.

问我,如果任何这还不清楚。纠正我,如果有任何这是错误的:)

Ask me if any of this is unclear. Correct me if any of this is wrong :)

这篇关于编程珍珠 - 随机选择算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆