从大小为N的数组生成一组M个元素 [英] Generate a set of M elements from an array of size N
问题描述
我正在尝试了解以下任务的解决方案: 从大小为N的数组中随机生成一组M个元素.每个元素的选择概率必须相同.
I'm trying to understand solution for the following task: Randomly generate a set of M elements from an array of size N. Each element must have equal probability of being chosen.
I found the following solution (I've already read this question, but it does not answer my question):
int rand(Random random, int min, int max) {
return random.nextInt(1 + max - min) + min;
}
char[] generateArray(char[] original, int subsetSize) {
char[] subset = new char[subsetSize];
Random random = new Random();
for (int i = 0; i < subsetSize; i++) {
subset[i] = original[i];
}
for (int i = subsetSize; i < original.length; i++) {
int r = rand(random,0, i);
boolean takeIthElement = r < subsetSize;
if (takeIthElement) {
subset[r] = original[i];
}
}
return subset;
}
// rand() function returns inclusive value
// i.e. rand(0, 5) will return from 0 to 5
此代码可在破解编码访谈"一书中找到(硬性部分,任务3). 作者对此解释如下:
This code was found in book "Cracking the coding interview" (Section Hard, Task 3). Author explains it as follows:
假设我们有一个算法可以从大小为
n - 1
的数组中提取随机的m
个元素集.我们如何使用这种算法从大小为n
的数组中提取随机的m
元素集?我们首先可以从前一个n - 1
元素中提取大小为m的随机集合.然后,我们只需要确定是否应将array[n]
插入到我们的子集中(这将需要从中提取随机元素).一种简单的方法是从0到n中选择一个随机数k.如果为k < m
,则将array[n]
插入subset[k]
.这将公平地"(即,具有成比例的概率)将array[n]
插入子集中,并且公平地"从子集中删除随机元素. 迭代编写甚至更干净.在这种方法中,我们将数组子集初始化为原始数组中的第一个m
元素.然后,我们遍历数组,从元素m
开始,每当k < m
时将array[i]
插入(随机)位置k
的子集中.
Suppose we have an algorithm that can pull a random set of
m
elements from an array of sizen - 1
. How can we use this algorithm to pull a random set ofm
elements from an array of sizen
? We can first pull a random set of size m from the firstn - 1
elements. Then, we just need to decide ifarray[n]
should be inserted into our subset (which would require pulling out a random element from it). An easy way to do this is to pick a random number k from 0 through n. Ifk < m
, then insertarray[n]
intosubset[k]
. This will both "fairly" (i.e., with proportional probability) insertarray[n]
into the subset and "fairly" remove a random element from the subset. This is even cleaner to write iteratively. In this approach, we initialize an array subset to be the firstm
elements in original. Then, we iterate through the array, starting at elementm
, insertingarray[i]
into the subset at (random) positionk
wheneverk < m
.
我认为作者想说我们需要生成未设置,而是生成数组.因此,我认为正确的任务描述应该是: 从大小为N的数组中随机生成一个由M个元素组成的数组.每个元素的选择概率必须相同.
I think author wanted to say that we need to generate not set, but array. So, I think the right task descriptions should be: Randomly generate an array of M elements from an array of size N. Each element must have equal probability of being chosen.
如果为true,则上述代码将无法正常工作.原因:
If it true, than the code above does not work correctly. Reasons:
- 例如,我们有一个数组
{'1', '2', 'a', 'b'}
和m = 2
- 因此,我们应该具有生成以下几组的概率:
{1, 2}; {2, 1}; {1, a}; {a, 1}; {1, b}; {b, 1}; {a, 2}; {2, a}; {b, 2}; {2, b}; {a, b}; {b, a}
我在这里担心的是,该函数永远不会生成以下集合:
{2, 1}; {2, a}; {2, b}
My concern here is that function will never generate the following sets:
{2, 1}; {2, a}; {2, b}
所以,这意味着它是不正确的.
So, it means that it is incorrect.
推荐答案
如何用数学证明呢?
How can I prove it with math?
第二个for
循环运行两次,首先i
等于2,然后i
等于3.
Your second for
loop runs twice, first with i
equal to 2, then with i
equal to 3.
当i
为2时,r
变为0、1或2,每个概率为1/3.因此,char a
会以索引0或1或根本不索引的形式移到您的结果中,每个概率为1/3.现在是[a,2],[1,a]或[1,2].
When i
is 2, r
becomes 0, 1 or 2, each with probability 1/3. So the char a
is moved into your result at index 0 or 1 or not at all, each with probability 1/3. Now it is either [a, 2], [1, a] or [1, 2].
当i
为3时,r
为0、1、2或3.b
以1/4的概率移至索引0,以1/4的概率移至索引1,并且不以任何概率移至任何位置概率为1/2.
When i
is 3, r
is 0, 1, 2 or 3. b
is moved to index 0 with probability 1/4, to index 1 with probability 1/4 and is not moved anywhere with probability 1/2.
在下表中,我给出了所有可能情况下的结果. r
,0、1和2下的值是第一次迭代中的可能值(i
= 2).右边的r
是第二次迭代中的可能值.
In the following table I have given the result in all possible cases. The values under r
, 0, 1 and 2, are the possible values in the first iteration (i
= 2). To the right or r
are the possible values in the second iteration.
r 0 1 2 3
0 [b, 2] [a, b] [a, 2] [a, 2]
1 [b, a] [1, b] [1, a] [1, a]
2 [b, 2] [1, b] [1, 2] [1, 2]
因此在表中您可以看到,如果两次r
均为0,则您的方法将返回[b, 2]
,依此类推.
So in the table you can read that if r
is 0 both times, your method will return [b, 2]
, etc.
表中的12个单元格中每个单元具有相等的概率,即1/12.让我们检查一下:[1,2],[1,a],[1,b],[a,2]和[b,2]分别存在两次. [a,b]和[b,a]各自出现一次,但是它们是同一集合,因此该集合也出现两次.这涵盖了所有可能的子集,因此它们的可能性也相同.
Each of the 12 cells in the table has equal probability, that is 1/12. Let’s inspect: [1, 2], [1, a], [1, b], [a, 2] and [b, 2] are there twice each. [a, b] and [b, a] come once each, but these are the same set, so that set comes twice too. This covers all possible subsets, so they are equally likely.
这篇关于从大小为N的数组生成一组M个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!