从大小为N的数组生成一组M个元素 [英] Generate a set of M elements from an array of size N

查看:115
本文介绍了从大小为N的数组生成一组M个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试了解以下任务的解决方案: 从大小为N的数组中随机生成一组M个元素.每个元素的选择概率必须相同.

I'm trying to understand solution for the following task: Randomly generate a set of M elements from an array of size N. Each element must have equal probability of being chosen.

我找到了以下解决方案(我已经阅读了

I found the following solution (I've already read this question, but it does not answer my question):

int rand(Random random, int min, int max) {
  return random.nextInt(1 + max - min) + min;
}

char[] generateArray(char[] original, int subsetSize) {
  char[] subset = new char[subsetSize];
  Random random = new Random();

  for (int i = 0; i < subsetSize; i++) {
    subset[i] = original[i];
  }
  for (int i = subsetSize; i < original.length; i++) {
    int r = rand(random,0, i);
    boolean takeIthElement = r < subsetSize;
    if (takeIthElement) {
      subset[r] = original[i];
    }
  }

  return subset;
}
// rand() function returns inclusive value 
// i.e. rand(0, 5) will return from 0 to 5

此代码可在破解编码访谈"一书中找到(硬性部分,任务3). 作者对此解释如下:

This code was found in book "Cracking the coding interview" (Section Hard, Task 3). Author explains it as follows:

假设我们有一个算法可以从大小为n - 1的数组中提取随机的m个元素集.我们如何使用这种算法从大小为n的数组中提取随机的m元素集?我们首先可以从前一个n - 1元素中提取大小为m的随机集合.然后,我们只需要确定是否应将array[n]插入到我们的子集中(这将需要从中提取随机元素).一种简单的方法是从0到n中选择一个随机数k.如果为k < m,则将array[n]插入subset[k].这将公平地"(即,具有成比例的概率)将array[n]插入子集中,并且公平地"从子集中删除随机元素. 迭代编写甚至更干净.在这种方法中,我们将数组子集初始化为原始数组中的第一个m元素.然后,我们遍历数组,从元素m开始,每当k < m时将array[i]插入(随机)位置k的子集中.

Suppose we have an algorithm that can pull a random set of m elements from an array of size n - 1. How can we use this algorithm to pull a random set of m elements from an array of size n? We can first pull a random set of size m from the first n - 1 elements. Then, we just need to decide if array[n] should be inserted into our subset (which would require pulling out a random element from it). An easy way to do this is to pick a random number k from 0 through n. If k < m, then insert array[n] into subset[k]. This will both "fairly" (i.e., with proportional probability) insert array[n] into the subset and "fairly" remove a random element from the subset. This is even cleaner to write iteratively. In this approach, we initialize an array subset to be the first m elements in original. Then, we iterate through the array, starting at element m, inserting array[i] into the subset at (random) position k whenever k < m.

我认为作者想说我们需要生成未设置,而是生成数组.因此,我认为正确的任务描述应该是: 从大小为N的数组中随机生成一个由M个元素组成的数组.每个元素的选择概率必须相同.

I think author wanted to say that we need to generate not set, but array. So, I think the right task descriptions should be: Randomly generate an array of M elements from an array of size N. Each element must have equal probability of being chosen.

如果为true,则上述代码将无法正常工作.原因:

If it true, than the code above does not work correctly. Reasons:

  1. 例如,我们有一个数组{'1', '2', 'a', 'b'}m = 2
  2. 因此,我们应该具有生成以下几组的概率:

{1, 2}; {2, 1}; {1, a}; {a, 1}; {1, b}; {b, 1}; {a, 2}; {2, a}; {b, 2}; {2, b}; {a, b}; {b, a}

我在这里担心的是,该函数永远不会生成以下集合: {2, 1}; {2, a}; {2, b}

My concern here is that function will never generate the following sets: {2, 1}; {2, a}; {2, b}

所以,这意味着它是不正确的.

So, it means that it is incorrect.

推荐答案

如何用数学证明呢?

How can I prove it with math?

第二个for循环运行两次,首先i等于2,然后i等于3.

Your second for loop runs twice, first with i equal to 2, then with i equal to 3.

i为2时,r变为0、1或2,每个概率为1/3.因此,char a会以索引0或1或根本不索引的形式移到您的结果中,每个概率为1/3.现在是[a,2],[1,a]或[1,2].

When i is 2, r becomes 0, 1 or 2, each with probability 1/3. So the char a is moved into your result at index 0 or 1 or not at all, each with probability 1/3. Now it is either [a, 2], [1, a] or [1, 2].

i为3时,r为0、1、2或3.b以1/4的概率移至索引0,以1/4的概率移至索引1,并且不以任何概率移至任何位置概率为1/2.

When i is 3, r is 0, 1, 2 or 3. b is moved to index 0 with probability 1/4, to index 1 with probability 1/4 and is not moved anywhere with probability 1/2.

在下表中,我给出了所有可能情况下的结果. r,0、1和2下的值是第一次迭代中的可能值(i = 2).右边的r是第二次迭代中的可能值.

In the following table I have given the result in all possible cases. The values under r, 0, 1 and 2, are the possible values in the first iteration (i = 2). To the right or r are the possible values in the second iteration.

r    0       1       2       3
0  [b, 2]  [a, b]  [a, 2]  [a, 2]
1  [b, a]  [1, b]  [1, a]  [1, a]
2  [b, 2]  [1, b]  [1, 2]  [1, 2]

因此在表中您可以看到,如果两次r均为0,则您的方法将返回[b, 2],依此类推.

So in the table you can read that if r is 0 both times, your method will return [b, 2], etc.

表中的12个单元格中每个单元具有相等的概率,即1/12.让我们检查一下:[1,2],[1,a],[1,b],[a,2]和[b,2]分别存在两次. [a,b]和[b,a]各自出现一次,但是它们是同一集合,因此该集合也出现两次.这涵盖了所有可能的子集,因此它们的可能性也相同.

Each of the 12 cells in the table has equal probability, that is 1/12. Let’s inspect: [1, 2], [1, a], [1, b], [a, 2] and [b, 2] are there twice each. [a, b] and [b, a] come once each, but these are the same set, so that set comes twice too. This covers all possible subsets, so they are equally likely.

这篇关于从大小为N的数组生成一组M个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆