从有限集中进行天真的随机选择的O值是多少? [英] What is O value for naive random selection from finite set?

查看:58
本文介绍了从有限集中进行天真的随机选择的O值是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于从有限集中获取随机值的问题让我开始思考...

This question on getting random values from a finite set got me thinking...

人们通常希望从一组Y值中检索X个唯一值.例如,我可能想从一副纸牌中发牌.我想要5张卡片,而且希望它们都是唯一的.

It's fairly common for people to want to retrieve X unique values from a set of Y values. For example, I may want to deal a hand from a deck of cards. I want 5 cards, and I want them to all be unique.

现在,我可以天真地做到这一点,随机选择5张卡,然后每次获得重复卡时再试一次,直到获得5张卡.但是,对于来自大型集合的大量值而言,这并不是那么好.例如,如果我想要从1,000,000个集合中获得999,999个值,则此方法会变得非常糟糕.

Now, I can do this naively, by picking a random card 5 times, and try again each time I get a duplicate, until I get 5 cards. This isn't so great, however, for large numbers of values from large sets. If I wanted 999,999 values from a set of 1,000,000, for instance, this method gets very bad.

问题是:有多糟?我正在寻找可以解释O()值的人.获取第x个数字将需要y次尝试...但是有多少次?我知道如何找出任意给定值,但是有没有一种直接的方法可以将其概括为整个序列并获得O()值?

The question is: how bad? I'm looking for someone to explain an O() value. Getting the xth number will take y attempts...but how many? I know how to figure this out for any given value, but is there a straightforward way to generalize this for the whole series and get an O() value?

(问题不是:我该如何改进?",因为它相对容易修复,并且我确信在其他地方已经有很多次了.)

(The question is not: "how can I improve this?" because it's relatively easy to fix, and I'm sure it's been covered many times elsewhere.)

推荐答案

变量

n =集合中项目的总数
m =要从n个项目集中检索的唯一值的数量
d(i) =在第i步中达到某个值所需的预期尝试次数
i =表示一个特定步骤.我∈ [0,n-1]
T(m,n) =使用朴素算法从一组n个项目中选择m个唯一项目的预期总尝试次数

Variables

n = the total amount of items in the set
m = the amount of unique values that are to be retrieved from the set of n items
d(i) = the expected amount of tries needed to achieve a value in step i
i = denotes one specific step. i ∈ [0, n-1]
T(m,n) = expected total amount of tries for selecting m unique items from a set of n items using the naive algorithm

第一步i = 0是微不足道的.无论选择哪种值,我们都会在第一次尝试时获得唯一的值.因此:

The first step, i=0, is trivial. No matter which value we choose, we get a unique one at the first attempt. Hence:

d(0)= 1

在第二步中,i = 1,我们至少需要尝试1次(选择有效的唯一值的尝试).最重要的是,我们可能会选择错误的值.机会是(先前选择的项目数量)/(项目总数).在这种情况下,为1/n.如果我们选择了错误的项目,则有1/n的机会,我们可能会再次选择错误的项目.将此乘以1/n,因为这是我们两次都选错了的总概率,得出(1/n) 2 .要理解这一点,绘制一个决策树会很有帮助.两次拣选一个非唯一的物品后,我们很可能会再次拣选它.这导致将(1/n) 3 添加到步骤i = 1中的尝试总数中.每次我们选择错误的号码时,都有机会再次选择错误的号码.结果是:

In the second step, i=1, we at least need 1 try (the try where we pick a valid unique value). On top of this, there is a chance that we choose the wrong value. This chance is (amount of previously picked items)/(total amount of items). In this case 1/n. In the case where we picked the wrong item, there is a 1/n chance we may pick the wrong item again. Multiplying this by 1/n, since that is the combined probability that we pick wrong both times, gives (1/n)2. To understand this, it is helpful to draw a decision tree. Having picked a non-unique item twice, there is a probability that we will do it again. This results in the addition of (1/n)3 to the total expected amounts of tries in step i=1. Each time we pick the wrong number, there is a chance we might pick the wrong number again. This results in:

d(1)= 1 + 1/n +(1/n) 2 +(1/n) 3 +(1/n) 4 + ...
d(1) = 1 + 1/n + (1/n)2 + (1/n)3 + (1/n)4 + ...

类似地,在第i:th步中,一次选择错误项目的机会为i/n,结果是:

Similarly, in the general i:th step, the chance to pick the wrong item in one choice is i/n, resulting in:

d(i)= 1 + i/n +(i/n) 2 +(i/n) 3 +(i/n) 4 + ... =
= sum((i/n) k ),其中k∈ [0,∞]
d(i) = 1 + i/n + (i/n)2 + (i/n)3 + (i/n)4 + ... =
= sum( (i/n)k ), where k ∈ [0,∞]

这是几何序列,因此很容易计算总和:

This is a geometric sequence and hence it is easy to compute it's sum:

d(i)=(1-i/n) -1
d(i) = (1 - i/n)-1

然后通过汇总每个步骤中的预期尝试次数来计算总体复杂度:

The overall complexity is then computed by summing the expected amount of tries in each step:

T(m,n)=和(d(i)),其中i∈ [0,m-1] =
= 1 +(1-1/n) -1 +(1-2 -n) -1 +( 1-3/n) -1 + ... +(1-(m-1)/n) -1
T(m,n) = sum ( d(i) ), where i ∈ [0,m-1] =
= 1 + (1 - 1/n)-1 + (1 - 2/n)-1 + (1 - 3/n)-1 + ... + (1 - (m-1)/n)-1

将上面系列中的分数扩展n,得到:

Extending the fractions in the series above by n, we get:

T(m,n)= n/n + n/(n-1)+ n/(n-2)+ n/(n-3)+ ... + n/(n-m + 2 )+ n/(n-m + 1)
T(m,n) = n/n + n/(n-1) + n/(n-2) + n/(n-3) + ... + n/(n-m+2) + n/(n-m+1)

我们可以使用以下事实:

We can use the fact that:

n/n≤ n/(n-1)≤ n/(n-2)≤ n/(n-3)≤ ...≤ n/(n-m + 2)≤ n/(n-m + 1)
n/n ≤ n/(n-1) ≤ n/(n-2) ≤ n/(n-3) ≤ ... ≤ n/(n-m+2) ≤ n/(n-m+1)

由于该系列包含m个项,并且每个项都满足上述不等式,因此我们得到:

Since the series has m terms, and each term satisfies the inequality above, we get:

T(m,n)≤ n/(n-m + 1)+ n/(n-m + 1)+ n/(n-m + 1)+ n/(n-m + 1)+ ... + n/(n-m +1)+ n/(n-m + 1)=
= m * n/(n-m + 1)
T(m,n) ≤ n/(n-m+1) + n/(n-m+1) + n/(n-m+1) + n/(n-m+1) + ... + n/(n-m+1) + n/(n-m+1) =
= m*n/(n-m+1)

通过使用某种技术来评估级数而不是使用(项数)*(最大项)的粗略方法来界定,可能(并且可能)建立稍微严格的上限.

It might be(and probably is) possible to establish a slightly stricter upper bound by using some technique to evaluate the series instead of bounding by the rough method of (amount of terms) * (biggest term)

这意味着Big-O订单为 O(m * n/(n-m + 1)).我看不出有什么办法可以简化这种表达方式.

This would mean that the Big-O order is O(m*n/(n-m+1)). I see no possible way to simplify this expression from the way it is.

回顾结果以检查是否有意义,我们看到,如果n为常数,并且m越来越接近n,结果将迅速增加,因为分母得到了很小.这就是我们所期望的,例如,如果考虑问题中给出的关于从1,000,000个集合中选择999,999个值"的示例.如果我们改为让m为常数并且n确实非常大地增长,那么复杂度将在极限n→中向O(m)收敛. ∞.这也是我们所期望的,因为从接近"无限大小的集合中选择恒定数量的项目时,选择先前选择的值的可能性基本上为0.因为没有冲突,所以我们需要m次尝试,而不需要n次尝试.

Looking back at the result to check if it makes sense, we see that, if n is constant, and m gets closer and closer to n, the results will quickly increase, since the denominator gets very small. This is what we'd expect, if we for example consider the example given in the question about selecting "999,999 values from a set of 1,000,000". If we instead let m be constant and n grow really, really large, the complexity will converge towards O(m) in the limit n → ∞. This is also what we'd expect, since while chosing a constant number of items from a "close to" infinitely sized set the probability of choosing a previously chosen value is basically 0. I.e. We need m tries independently of n since there are no collisions.

这篇关于从有限集中进行天真的随机选择的O值是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆