什么为范围内产生偏随机整数的最优算法? [英] What is the optimal algorithm for generating an unbiased random integer within a range?

查看:207
本文介绍了什么为范围内产生偏随机整数的最优算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在此的StackOverflow问题:

从一系列生成随机整数

接受的答案建议下面的公式给出最大与<之间产生一个随机整数code>分和最大被纳入范围:

 输出=分钟+(兰特()%(INT)(最大 - 最小+ 1))

但它也说,


  

这仍然是的的向低的数字偏置......这也是
  可能因此,它消除偏差扩展。


但它没有解释为什么它向低的数字或如何删除偏见偏见。所以,问题是:这是生成随机整数中的最优化方法(签字)范围内,而不是依靠任何幻想,只是兰特()功能,在情况下,如果它是最优的,如何去除偏见?

编辑:

我刚刚测试通过@Joey针对浮点推断提出的,而 -loop算法:

 静态常量双s_invRandMax = 1.0 /((双)RAND_MAX + 1.0);
返回分钟+(INT)(((双)(MAX + 1 - 分钟))* RAND()* s_invRandMax);

看到多少均匀球被下降到和之间的一些桶,一个测试为浮点推断,另一个用于,而 -loop算法。但结果竟然因球(和水桶)的数量是变化的,所以我不能轻易挑选一个​​赢家。工作code可以在这个Ideone页面找到。例如,用10桶和100的球从桶中提供了理想的概率的最大偏差小于为浮点外推法比对 -loop算法(0.04和0.05分别),但1000球,的,而 -loop算法较小(0.024和0.011),并与10000球,浮点外推老毛病又犯的最大偏差更好(0.0034和0.0053),等没有太多的一致性。的,没有任何的算法始终产生均匀分布比其它算法更好的可能性的思想,使我对浮点外推靠在因为它似乎表现比 -loop算法。因此,它是精细选择浮点算法外推或本人testings /结论并不完全正确?


解决方案

时的从随机数发生器输出的数量(RAND_MAX + 1)不是均匀地通过在所需的范围整除,会出现问题(最大 - 最小+1) 。因为将有来自随机数一致的映射到输出,某些输出将被映射到更随机数比其他。这是无论映射是怎么做的 - 你可以用取模,除法转换为浮点,不管你巫术能拿出,基本问题仍然

问题的规模非常小,而且要求不高的应用程序可以用通常忽略它逃脱。较小的范围和较大RAND_MAX就是,不太明显的效果会

我把你的范例程序和调整它一下。首先,我创建的兰特,只有拥有一系列0-255的一个特殊版本,更好地展示效果。我做了一些调整 rangeRandomAlg2 。最后,我改变了球的人数为1000000,提高了一致性。你可以在这里看到的结果: http://ideone.com/4P4HY

注意,浮点版本产生两个紧紧分组概率,无论是附近或0.101 0.097,没有在两者之间。这是在行动中的偏差。

我想调用这个Java的算法是一个有点误导 - 我敢肯定它比Java的更老

  INT rangeRandomAlg2(INT分钟,INT最大值)
{
    INT N =最大值 - 最小值+ 1;
    INT余数= RAND_MAX%N;
    INT X;
    做
    {
        X = RAND();
    }而(X GT = RAND_MAX - 余数);
    返回分钟+ X%N;
}

In this StackOverflow question:

Generating random integer from a range

the accepted answer suggests the following formula for generating a random integer in between given min and max, with min and max being included into the range:

output = min + (rand() % (int)(max - min + 1))

But it also says that

This is still slightly biased towards lower numbers ... It's also possible to extend it so that it removes the bias.

But it doesn't explain why it's biased towards lower numbers or how to remove the bias. So, the question is: is this the most optimal approach to generation of a random integer within a (signed) range while not relying on anything fancy, just rand() function, and in case if it is optimal, how to remove the bias?

EDIT:

I've just tested the while-loop algorithm suggested by @Joey against floating-point extrapolation:

static const double s_invRandMax = 1.0/((double)RAND_MAX + 1.0);
return min + (int)(((double)(max + 1 - min))*rand()*s_invRandMax);

to see how much uniformly "balls" are "falling" into and are being distributed among a number of "buckets", one test for the floating-point extrapolation and another for the while-loop algorithm. But results turned out to be varying depending on the number of "balls" (and "buckets") so I couldn't easily pick a winner. The working code can be found at this Ideone page. For example, with 10 buckets and 100 balls the maximum deviation from the ideal probability among buckets is less for the floating-point extrapolation than for the while-loop algorithm (0.04 and 0.05 respectively) but with 1000 balls, the maximum deviation of the while-loop algorithm is lesser (0.024 and 0.011), and with 10000 balls, the floating-point extrapolation is again doing better (0.0034 and 0.0053), and so on without much of consistency. Thinking of the possibility that none of the algorithms consistently produces uniform distribution better than that of the other algorithm, makes me lean towards the floating-point extrapolation since it appears to perform faster than the while-loop algorithm. So is it fine to choose the floating-point extrapolation algorithm or my testings/conclusions are not completely correct?

解决方案

The problem occurs when the number of outputs from the random number generator (RAND_MAX+1) is not evenly divisible by the desired range (max-min+1). Since there will be a consistent mapping from a random number to an output, some outputs will be mapped to more random numbers than others. This is regardless of how the mapping is done - you can use modulo, division, conversion to floating point, whatever voodoo you can come up with, the basic problem remains.

The magnitude of the problem is very small, and undemanding applications can generally get away with ignoring it. The smaller the range and the larger RAND_MAX is, the less pronounced the effect will be.

I took your example program and tweaked it a bit. First I created a special version of rand that only has a range of 0-255, to better demonstrate the effect. I made a few tweaks to rangeRandomAlg2. Finally I changed the number of "balls" to 1000000 to improve the consistency. You can see the results here: http://ideone.com/4P4HY

Notice that the floating-point version produces two tightly grouped probabilities, near either 0.101 or 0.097, nothing in between. This is the bias in action.

I think calling this "Java's algorithm" is a bit misleading - I'm sure it's much older than Java.

int rangeRandomAlg2 (int min, int max)
{
    int n = max - min + 1;
    int remainder = RAND_MAX % n;
    int x;
    do
    {
        x = rand();
    } while (x >= RAND_MAX - remainder);
    return min + x % n;
}

这篇关于什么为范围内产生偏随机整数的最优算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆