整数数组中的唯一随机数生成 [英] Unique random number generation in an integer array

查看:24
本文介绍了整数数组中的唯一随机数生成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能的重复:
O(1) 中的唯一随机数?

如何在 C 中使用唯一值(无重复)填充整数数组?

How do I fill an integer array with unique values (no duplicates) in C?

int vektor[10];   

for (i = 0; i < 10; i++) {
    vektor[i] = rand() % 100 + 1;
}

//No uniqueness here

推荐答案

有多种方法可以解决您的问题,每种方法都有自己的优点和缺点.

There are several ways to solve your problem, each has its own advantages and disadvantages.

首先,我想指出,您已经收到了很多执行以下操作的响应:它们生成一个随机数,然后以某种方式检查它是否已在数组中使用,如果已使用,则它们只需生成另一个数字,直到他们找到未使用的数字.这是一种天真的方法,说实话,有严重缺陷.问题在于数字生成的循环试错性质(如果已经使用,请再试一次").如果数字范围(例如 [1..N])接近所需数组的长度(例如 M),那么算法最后可能会花费大量时间试图找到下一个数字.如果随机数生成器甚至有点损坏(例如,从不生成一些数字,或者很少生成),那么 N == M 算法保证永远循环(或很长一段时间).通常,这种反复试验的方法是无用的,或者充其量是有缺陷的方法.

First I'd like to note that you already got quite a few of responses that do the following: they generate a random number, then check somehow whether it was already used in the array, and if it was already used, they just generate another number until they find an unused one. This is a naive and, truth to be said, seriously flawed approach. The problem is with the cyclic trial-and-error nature of the number generation ("if already used, try again"). If the numeric range (say, [1..N]) is close to the length of the desired array (say, M), then towards the end the algorithm might spend a huge amount of time trying to find the next number. If the random number generator is even a little bit broken (say, never generates some number, or does it very rarely), then with N == M the algorithm is guaranteed to loop forever (or for a very long time). Generally this trial-and-error approach is a useless one, or a flawed one at best.

这里已经介绍的另一种方法是在大小为 N 的数组中生成随机排列.随机排列的想​​法是一个很有前途的方法,但是在大小为 N 的数组上执行它(当 M << N 时)肯定会打个比方,产生的热量多于光.

Another approach already presented here is generating a random permutation in an array of size N. The idea of random permutation is a promising one, but doing it on an array of size N (when M << N) will certainly generate more heat than light, speaking figuratively.

可以在例如 Bentley 的Programming Pearls"(其中一些摘自 Knuth)中找到解决此问题的好方法.

Good solutions to this problem can be found, for example, in Bentley's "Programming Pearls" (and some of them are taken from Knuth).

  • Knuth 算法.这是一个非常简单的算法,复杂度为 O(N)(即数值范围),这意味着当 M 接近 N 时它最有用.但是,除了你的 vektor 数组之外,这个算法不需要任何额外的内存,这与已经提供的带有排列的变体相反(意味着它需要 O(M) 内存,而不是 O(N) 作为其他排列基于此处建议的算法).后者使其成为一种可行的算法,即使对于 M <
  • The Knuth algorithm. This is a very simple algorithm with a complexity of O(N) (i.e. the numeric range), meaning that it is most usable when M is close to N. However, this algorithm doesn't require any extra memory in addition to your vektor array, as opposed to already offered variant with permutations (meaning that it takes O(M) memory, not O(N) as other permutation-based algorithms suggested here). The latter makes it a viable algorithm even for M << N cases.

算法的工作原理如下:遍历从 1 到 N 的所有数字,并以概率 rm/rn 选择当前数字,其中 rm 是我们还剩下多少个数字需要查找,而 rn 是我们还需要迭代多少个数字.这是您案例的可能实现

The algorithm works as follows: iterate through all numbers from 1 to N and select the current number with probability rm / rn, where rm is how many numbers we still need to find, and rn is how many numbers we still need to iterate through. Here's a possible implementation for your case

#define M 10
#define N 100

int in, im;

im = 0;

for (in = 0; in < N && im < M; ++in) {
  int rn = N - in;
  int rm = M - im;
  if (rand() % rn < rm)    
    /* Take it */
    vektor[im++] = in + 1; /* +1 since your range begins from 1 */
}

assert(im == M);

在这个循环之后,我们得到一个数组vektor,其中填充了随机选择的数字升序.升序"位是我们在这里不需要的.所以,为了修复"我们只是对 vektor 的元素进行随机排列,我们就完成了.请注意,这是一个不需要额外内存的 O(M) 排列.(我省略了置换算法的实现.这里已经给出了很多链接.)

After this cycle we get an array vektor filled with randomly chosen numbers in ascending order. The "ascending order" bit is what we don't need here. So, in order to "fix" that we just make a random permutation of elements of vektor and we are done. Note, that the this is a O(M) permutation requiring no extra memory. (I leave out the implementation of the permutation algorithm. Plenty of links was given here already.).

如果您仔细查看此处提出的对长度为 N 的数组进行操作的基于置换的算法,您会发现它们中的大多数几乎都是相同的 Knuth 算法,但为 M 重新制定== N.在这种情况下,上面的选择循环将选择概率为 1 的 [1..N] 范围内的每个数字,有效地转化为数字 1 到 N 的 N 数组的初始化.考虑到这一点,我认为它变得相当很明显,为 M == N 运行这个算法然后截断结果(可能丢弃大部分)比仅仅以原始形式运行这个算法来获取 M 的原始值并得到结果立即,没有任何截断.

If you look carefully at the permutation-based algorithms proposed here that operate on an array of length N, you'll see that most of them are pretty much this very same Knuth algorithm, but re-formulated for M == N. In that case the above selection cycle will chose each and every number in [1..N] range with probabilty 1, effectively turning into initialization of an N-array with numbers 1 to N. Taking this into account, I think it becomes rather obvious that running this algorithm for M == N and then truncating the result (possibly discarding most of it) makes much less sense than just running this algorithm in its original form for the original value of M and getting the result right away, without any truncation.

  • Floyd 算法(参见 此处).这种方法的复杂度约为 O(M)(取决于所使用的搜索结构),因此当 M 不会进行任何那些可恶的试错迭代,试图找到一个未使用的随机数.该算法保证在每次调用随机数生成器后生成一个唯一的随机数.
  • The Floyd algorithm (see here). This approach has the complexity of about O(M) (depends on the search structure used), so it is better suitable when M << N. This approach keeps track of already generated random numbers, so it requires extra memory. However, the beauty of it is that it does not make any of those abominable trial-and-error iterations, trying to find an unused random number. This algorithm is guaranteed to generate one unique random number after each call to the random number generator.

这是针对您的情况的可能实现.(有多种方法可以跟踪已使用的数字.假设 N 不是太大,我将只使用一组标志)

Here's a possible implementation for it for your case. (There are different ways to keep track of already used numbers. I'll just use an array of flags, assuming that N is not prohibitively large)

#define M 10
#define N 100    

unsigned char is_used[N] = { 0 }; /* flags */
int in, im;

im = 0;

for (in = N - M; in < N && im < M; ++in) {
  int r = rand() % (in + 1); /* generate a random number 'r' */

  if (is_used[r])
    /* we already have 'r' */
    r = in; /* use 'in' instead of the generated number */

  assert(!is_used[r]);
  vektor[im++] = r + 1; /* +1 since your range begins from 1 */
  is_used[r] = 1;
}

assert(im == M);

为什么上述工作不是很明显.但它有效.将从 [1..N] 范围内选取均匀分布的正好 M 个数字.

Why the above works is not immediately obvious. But it works. Exactly M numbers from [1..N] range will be picked with uniform distribution.

请注意,对于大 N,您可以使用基于搜索的结构来存储已使用"的数字,从而获得具有 O(M) 内存要求的 O(M log M) 算法.

Note, that for large N you can use a search-based structure to store "already used" numbers, thus getting a nice O(M log M) algorithm with O(M) memory requirement.

(虽然这个算法有一点:虽然结果数组不会被排序,但原始 1..N 排序的某种影响"仍然会出现在结果中.例如,很明显,数字N,如果被选中,只能是结果数组的最后一个成员.如果这种意外排序对结果的污染"是不可接受的,则结果 vektor 数组可以随机打乱,就像在 Khuth 算法中一样).

(There's one thing about this algorithm though: while the resultant array will not be ordered, a certain "influence" of the original 1..N ordering will still be present in the result. For example, it is obvious that number N, if selected, can only be the very last member of the resultant array. If this "contamination" of the result by the unintended ordering is not acceptable, the resultant vektor array can be random-shuffled, just like in the Khuth algorithm).

注意在这两个算法的设计中观察到的非常关键的一点:它们从不循环,试图找到一个新的未使用的随机数.从实际的角度来看,任何使用随机数进行试错迭代的算法都是有缺陷的.此外,这些算法的内存消耗与 M 相关,而非 N

Note the very critical point observed in the design of these two algoritms: they never loop, trying to find a new unused random number. Any algorithm that makes trial-and-error iterations with random numbers is flawed from practical point of view. Also, the memory consumption of these algorithms is tied to M, not to N

对于 OP,我会向 OP 推荐 Floyd 算法,因为在他的应用程序中,M 似乎比 N 小得多,并且它不需要(或可能不需要)进行额外的置换.然而,对于如此小的 N 值,差异可能可以忽略不计.

To the OP I would recommend the Floyd's algorithm, since in his application M seems to be considerably less than N and that it doesn't (or may not) require an extra pass for permutation. However, for such small values of N the difference might be negligible.

这篇关于整数数组中的唯一随机数生成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆