可以使用c ++中的“rand()”来生成无偏的bools吗? [英] Can `rand()` in c++ be used to generate unbiased bools?

查看:236
本文介绍了可以使用c ++中的“rand()”来生成无偏的bools吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了以下函数

  bool random_bool(double probability)
{
double p_scaled = probability *(RAND_MAX + 1) - rand();
if(p_scaled> = 1)返回true;
if(p_scaled< = 0)return false;
return random_bool(p_scaled);
}

给定, rand() {0,1,...,RAND_MAX-1,RAND_MAX} 中的均匀分布生成一个数字,后续调用的数字可以被视为独立于所有实际目的除了加密,这应该返回 true ,概率 p :两个 if 语句返回 true ,概率略低于 p false 的概率略高于 1-p ,而递归电话处理其他所有内容。



但是以下测试失败:

  long long N = 10000000000; // 1e10 
double p = 10000.0 / N;
int counter = 0;
for(long long i = 0; i< N; i ++)if(random_bool(p))counter ++;
assert(9672< counter&&& counter< = 10330);

assert语句仅在0.1%的情况下设计为失败。但是它总是失败(计数器在10600到10700之间)。



怎么了? >

PS:我看过这个问题,但它没有帮助...

解决方案

随机数生成器中的一个常见缺陷是对较小结果略有偏差(基本上略高于高位中的0)。这通常发生在将RNG内部状态包装到输出范围时使用简单的mod进行处理,该模型偏向高值,除非RAND_MAX是内部状态大小的除数。这是一个典型的偏差映射实现:

  static unsigned int state; 

int rand(){
state = nextState(); / *这实际上将状态从一个随机值移动到下一个,例如,使用LCG * /
返回状态%RAND_MAX; / *偏见* /
}

发生偏差是因为较低的值输出有一个从国家的MOD下映射。例如,如果状态的值为0-9(10个值),并且RAND_MAX为3(因此值为0-2),则%3 操作将导致在状态

 输出状态
0 0 3 6 9
1 1 4 7
2 2 5 8

结果0被超额显示,因为它有4/10的机会选择,vs 3/10为其他值。



作为更可能的值的例子,如果内部的RNG状态是一个16整数,而$ code> RAND_MAX 是35767(正如你所说的那样在你的平台上),那么所有的值[0,6000]将被输出3个不同的状态值,但剩下的〜30,000个值只能输出2个不同的状态值 - 一个显着的偏差。这种偏见往往会导致您的计数器值高于预期(因为小于rand()的均匀收益有利于 p_scaled> = 1 条件。 p>

如果您可以在平台上发布rand()的确切实现,这将有所帮助,如果在高位中出现偏差,您可能会消除这一点通过将您从rand()获得的值传递给一个好的散列函数,但更好的方法可能只是使用高质量的随机数源,例如 Mersenne Twister
。更好的生成器也将具有更大的输出范围(有效,更高的RAND_MAX),这意味着您的算法将遭受较少的重试次数递归。



即使Visual Studio运行时实现受到此缺陷的影响,值得注意的是,这可能至少部分是有意的设计选择 - 使用像35767这样的RAND_MAX t帽子对于状态大小(通常为2的幂)相对较高,确保较低位的更好的随机性,因为%操作有效地混合高位和低位,并且具有偏置/非随机低阶位通常是由于 rand()的调用者的普遍存在,使用%减少范围,这实际上仅使用低阶位数为2的幂(也很常见)。


I have written the following function

bool random_bool(double probability)
{
    double p_scaled = probability * (RAND_MAX+1) - rand();
    if ( p_scaled >= 1 ) return true;
    if ( p_scaled <= 0 ) return false;
    return random_bool( p_scaled );
}

Given, that rand() generates a number from uniform distribution on {0,1,...,RAND_MAX-1,RAND_MAX} and numbers from subsequent calls can be treated as independent for all practical purposes except cryptography, this should return true with probability p: two if statements return true with probability slightly below p, and false with the probability slightly above 1-p, while the recursive call deals with everything else.

However the following test fails:

long long N = 10000000000; //1e10
double p = 10000.0 / N;
int counter = 0;
for (long long i=0;i<N;i++) if (random_bool(p)) counter++;
assert(9672 < counter && counter <= 10330);

The assert statement is designed to fail only in 0.1% of cases. However it fails all the time (with counter being between 10600 and 10700).

What's wrong?

P.S.: I've seen this question, but it doesn't help...

解决方案

One common defect in random number generators is a slight bias towards smaller results (basically a slight bias towards 0 in high order bits). This often happens when wrapping the RNG internal state to the output range is done using a simple mod, which is biased against high values unless RAND_MAX is a divisor of the size of the internal state. Here's a typical biased mapping implementation:

static unsigned int state;

int rand() {
   state = nextState(); /* this actually moves the state from one random value to the next, eg., using a LCG */
   return state % RAND_MAX;  /* biased */
}

The bias occurs because lower values output an have one more mapping under mod from the state. E.g., if the state can have values 0-9 (10 values), and RAND_MAX is 3 (so values 0-2), then the % 3 operation results in, depending on the state

Output  State
0       0 3 6 9 
1       1 4 7
2       2 5 8

The result 0 is over-represented because it has a 4/10 chance of being selected, vs 3/10 for the other values.

As an example with more likely values, if the internal RNG state is a 16-integer, and RAND_MAX is 35767 (as you mentioned it is on your platform), then all the values [0,6000] will be be output for 3 different state values, but the remaining ~30,000 values will only be output for 2 distinct state values - a significant bias. This kind of bias would tend to cause your counter value to be higher than expected (since smaller than uniform returns from rand() favors the p_scaled >= 1 condition.

It would help if you could post the exact implementation of rand() on your platform. If it turns out to be bias in the high bits, you may be able to eliminate this by passing the values you get from rand() through a good hash function, but a better approach is probably just to use a high quality source of random numbers, e.g., the Mersenne Twister . A better generator will also have a larger output range (effective, a higher RAND_MAX), which means your algorithm will suffer fewer retries/less recursion.

Even if the Visual Studio runtime implementation suffers from this defect, it is worth noting that it was probably at least partly an intentional design choice - using a RAND_MAX like 35767 that is relatively prime to the state size (typically a power of 2), ensures better randomness of the lower bits, since the % operation effectively mixes the high and low order bits - and having biased/non-random low order bits is often a bigger problem in practice than a slight bias in the high order bits because of the ubiquity of the caller of rand() reducing the range using %, which effectively uses only the low order bits for moduli which are powers of 2 (also very common).

这篇关于可以使用c ++中的“rand()”来生成无偏的bools吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆