为什么人们说使用随机数发生器时存在模偏差? [英] Why do people say there is modulo bias when using a random number generator?

查看:454
本文介绍了为什么人们说使用随机数发生器时存在模偏差?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看过这个问题很多,但从来没有见过真正具体的答案。所以我要在这里发布一个,这将有希望帮助人们理解为什么在使用随机数字生成器,如 rand()在C ++中为什么有模数偏差。 / p>

解决方案

因此 rand()是一个伪随机数生成器一个在 cstdlib 中定义的常量的自然数0到 RAND_MAX (参见文章,了解 rand())的概述。 p>

现在如果你想生成一个介于0和2之间的随机数,会发生什么?为了解释,让我们说 RAND_MAX 是10,我决定通过调用 rand()%3生成0和2之间的随机数。 / code>。但是, rand()%3 不会以等概率产生0和2之间的数字!



rand()返回0,3,6或9时, code> rand()%3 == 0 。因此,当 rand()返回1,4时,P(0)= 4/11



< 7或10, c> rand()%3 == 1 。因此,当 rand()返回2,5,则P(1)= 4/11



<或8, rand()%3 == 2 。因此,P(2)= 3/11



等概率。当然对于小范围,这可能不是最大的问题,但对于更大的范围,这可能偏斜分布,偏向较小的数字。



所以当 rand()%n 以相等的概率返回从0到n-1的数字范围?当 RAND_MAX%n == n - 1 时。在这种情况下,与我们早先的假设 rand()确实以相等的概率返回0和 RAND_MAX n的模数类也将均匀分布。



那么我们如何解决这个问题呢?一个粗略的方法是保持生成随机数,直到您获得所需范围内的数字:

  int x; 
do {
x = rand();
} while(x> = n);

但对于 n ,因为你只有一个 n / RAND_MAX 的机会获得一个值在你的范围内,所以你需要执行 RAND_MAX / n 平均调用 rand()



更有效的公式方法是一些大范围,长度可由 n 整除,如 RAND_MAX - RAND_MAX%n ,保持生成随机数,直到得到一个在范围内,然后取模量:

  int x; 

do {
x = rand();
} while(x> =(RAND_MAX-RAND_MAX%n));

x%= n;

对于 n 的小值,很少需要多次调用 rand()






着作和进一步阅读:







I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly there is "modulo bias" when using a random number generator, like rand() in C++.

解决方案

So rand() is a pseudo-random number generator which chooses a natural number between 0 and RAND_MAX, which is a constant defined in cstdlib (see this article for a general overview on rand()).

Now what happens if you want to generate a random number between say 0 and 2? For the sake of explanation, let's say RAND_MAX is 10 and I decide to generate a random number between 0 and 2 by calling rand()%3. However, rand()%3 does not produce the numbers between 0 and 2 with equal probability!

When rand() returns 0, 3, 6, or 9, rand()%3 == 0. Therefore, P(0) = 4/11

When rand() returns 1, 4, 7, or 10, rand()%3 == 1. Therefore, P(1) = 4/11

When rand() returns 2, 5, or 8, rand()%3 == 2. Therefore, P(2) = 3/11

This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.

So when does rand()%n return a range of numbers from 0 to n-1 with equal probability? When RAND_MAX%n == n - 1. In this case, along with our earlier assumption rand() does return a number between 0 and RAND_MAX with equal probability, the modulo classes of n would also be equally distributed.

So how do we solve this problem? A crude way is to keep generating random numbers until you get a number in your desired range:

int x; 
do {
    x = rand();
} while (x >= n);

but that's inefficient for low values of n, since you only have a n/RAND_MAX chance of getting a value in your range, and so you'll need to perform RAND_MAX/n calls to rand() on average.

A more efficient formula approach would be to take some large range with a length divisible by n, like RAND_MAX - RAND_MAX % n, keep generating random numbers until you get one that lies in the range, and then take the modulus:

int x;

do {
    x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));

x %= n;

For small values of n, this will rarely require more than one call to rand().


Works cited and further reading:


这篇关于为什么人们说使用随机数发生器时存在模偏差?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆