偏差如何在有界随机数生成中表现出来 [英] How does bias manifest in bounded random number generation

查看:56
本文介绍了偏差如何在有界随机数生成中表现出来的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试消化以下帖子https://www.pcg-random.org/posts/bounded-rands.html关于无偏、高效的随机数生成.

这是描述经典模方法的摘录.

uint32_t bounded_rand(rng_t& rng, uint32_t range) {返回 rng() % 范围;}

<块引用>

但是除了速度慢之外,也是有偏差的.了解原因rand() % 52 产生有偏差的数字,如果我们假设 rand() 产生[0..2^32) 范围内的数字,请注意 52 并不完美除以 2^32,将其除以 82,595,524 次余数为 48. 含义如果我们使用 rand() % 52,将会有 82,595,525 种选择方式我们 52 张牌中的前 48 张牌,只有 82,595,524 种方法选择最后四张牌.换句话说,有 0.00000121%对这最后四张牌的偏见...

该博文继续展示另一种技术,该技术使用浮点运算本质上生成所需范围的随机分数并将其截断为整数.

static uint32_t bounded_rand(rng_t& rng, uint32_t range) {双零一 = 0x1.0p-32 * rng();返回范围 * zeroone;}

<块引用>

这种方法与经典的模方法一样有偏见,但是偏见以不同的方式表现出来.例如,如果我们是选择范围 [0..52) 中的数字,数字 0、13、26 和 39会比其他人少出现一次.

最后一段让我感到困惑.我不太精通浮点运算,所以我正在努力在模法中的偏差和浮点法中的偏差之间建立联系.我所看到的是,在这两种技术中,有 4 个数字是有偏差的.

解决方案

让我们从小处着手.假设我们有一个方法 rng() 可以在 [0, 128) 中生成任何随机整数.如果我们将其所有 128 个结果映射如下(其中 X 是这些结果之一):

 floor((X/128.0) * 52)

然后我们得到下表:

 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8, 8,8, 9, 9, 10, 10, 10, 11, 11, 12, 12, 13, 13, 13, 14, 14, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18,19, 19, 19, 20, 20, 21, 21, 21, 22, 22, 23, 23, 23, 24, 24, 25, 25, 26, 26, 26, 27, 27, 28, 28, 229, 29, 30, 30, 30, 31, 31, 32, 32, 32, 33, 33, 34, 34, 34, 35, 35, 36, 36, 36, 37, 37, 38, 38, 339, 39, 40, 40, 41, 41, 41, 42, 42, 43, 43, 43, 44, 44, 45, 45, 45, 46, 46, 47, 47, 47, 48, 48,49, 49, 50, 50, 51, 51

请注意,有些数字在此表中出现了两次,有些则出现了 3 次.这是因为我们将大范围映射到小范围并且 128 不能被 52 整除,而且还因为舍入误差.在这个例子中,52 除以 128 大约是 0.4,所以表中的下一个条目是前一个条目加上大约 0.4,然后表中的所有条目都被四舍五入,创建一些比其他的更频繁出现的数字.另一方面,如果我们使用 64 而不是 52,那么 128 项表中的所有 64 个条目将恰好出现两次.

另见A模数约简的快速替代方法",Daniel Lemire.

<小时>

以上表格的详细构成如下.如果我们将这些结果映射如下:

X/128.0

然后表格的开头看起来像:

0.000, 0.008, 0.016, 0.023, 0.031, 0.039, 0.047, 0.055, 0.062, 0.070, 0.078, 0.086, 0.094, 0.102, 0.102, 0.094, 0.039, 0.047, 0.094, 0.094, 0.039, 0.047

如果我们将此表乘以 52,它现在看起来像:

0.000, 0.406, 0.812, 1.219, 1.625, 2.031, 2.438, 2.844, 3.250, 3.656, 4.062, 4.469, 4.8869, 4.826, 5,6.5, 6, 5, 6, .6, 5, 6 . 8, 5, 6 .5

最后我们四舍五入得到:

0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, ...

I am trying to digest the following post https://www.pcg-random.org/posts/bounded-rands.html on non biased, efficient random number generation.

Here is an excerpt describing the classical, modulo approach.

uint32_t bounded_rand(rng_t& rng, uint32_t range) {
    return rng() % range;
}

But in addition to being slow, it is also biased. To understand why rand() % 52 produces biased numbers, if we assume that rand() produces numbers in the range [0..2^32), observe that 52 does not perfectly divide 2^32, it divides it 82,595,524 times with remainder 48. Meaning that if we use rand() % 52, there will be 82,595,525 ways to select the first 48 cards from our 52-card deck and only 82,595,524 ways to select the final four cards . In other words, there is a 0.00000121% bias against these last four cards...

The post goes on to show another technique that uses floating-point arithmetic to essentially generate a random fraction of the desired range and truncate it to an integer.

static uint32_t bounded_rand(rng_t& rng, uint32_t range) {
    double zeroone = 0x1.0p-32 * rng();
    return range * zeroone;
}

This approach is just as biased as the classic modulo approach, but the bias manifests itself differently. For example, if we were choosing numbers in the range [0..52), the numbers 0, 13, 26 and 39 would appear once less often than the others.

The last paragraph is what has me confused. I am not well versed in floating-point arithmetic, so I am struggling to make the connection between the bias in the modulo method and the bias in the floating-point method. All I see is that in both techniques, 4 numbers are biased against.

解决方案

Let's start small. Say we have a method rng() that generates any random integer in [0, 128). If we map all of its 128 outcomes as follows (where X is one of these outcomes):

 floor((X / 128.0) * 52)

Then we get the following table:

 0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8, 8, 8, 9, 9, 10, 10, 10, 11, 11, 12, 12, 13, 13, 13, 14, 14, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 19, 19, 19, 20, 20, 21, 21, 21, 22, 22, 23, 23, 23, 24, 24, 25, 25, 26, 26, 26, 27, 27, 28, 28, 28, 29, 29, 30, 30, 30, 31, 31, 32, 32, 32, 33, 33, 34, 34, 34, 35, 35, 36, 36, 36, 37, 37, 38, 38, 39, 39, 39, 40, 40, 41, 41, 41, 42, 42, 43, 43, 43, 44, 44, 45, 45, 45, 46, 46, 47, 47, 47, 48, 48, 49, 49, 49, 50, 50, 51, 51

Note that some numbers occur twice in this table, others three times. This is because we're mapping a large range to a small one and 128 is not divisible by 52, and also because of rounding error. In this example, 52 divided by 128 is about 0.4, so the next entry in the table is the previous entry plus about 0.4, then all the entries in the table are rounded down, creating some numbers that occur more frequently than others. On the other hand, if we used 64 instead of 52, then all 64 entries in the 128-item table would occur exactly twice.

See also "A Fast Alternative to the Modulo Reduction" by Daniel Lemire.


Here is how the table above was formed in detail. If we mapped these outcomes as follows instead:

X / 128.0

Then the start of the table will look like:

0.000, 0.008, 0.016, 0.023, 0.031, 0.039, 0.047, 0.055, 0.062, 0.070, 0.078, 0.086, 0.094, 0.102, 0.109, 0.117, 0.125, 0.133, ...

If we multiply this table by 52, it will now look like:

0.000, 0.406, 0.812, 1.219, 1.625, 2.031, 2.438, 2.844, 3.250, 3.656, 4.062, 4.469, 4.875, 5.281, 5.688, 6.094, 6.500, 6.906, 7.312, ...

And finally we round down to get:

0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, ...

这篇关于偏差如何在有界随机数生成中表现出来的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆