使用浮点源统一分布整数 [英] Uniform distribution of integers using floating point source

查看:154
本文介绍了使用浮点源统一分布整数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在JavaScript中获取[0,n]范围内的随机整数的标准方法 - 或者只提供返回范围为[0,1)的float的random()函数的任何其他语言 - 是使用 Math.floor(Math.random()* n)

The standard way to get a random integer in the range [0, n) in JavaScript - or any other language that only offers a random() function that returns a float in the range [0,1) - is to use Math.floor(Math.random() * n).

现在假设我们在有理数的集合上运算,这背后的数学是微不足道的。问题是:由于IEEE-754浮点数的所有复杂性,得到的分布实际上是非常均匀的吗?

Now the math behind this is trivial assuming we're operating on the set of rational numbers. The question is: With all the complications of IEEE-754 floating point numbers is the resulting distribution actually really uniform?

考虑到一个浮点数与下一个浮点数之间的差距随着它们变大而增加,我认为这应该会对较小的数字引入某种偏见。 / p>

Considering that the gap between one floating point number and the next higher one increases as they grow larger I would think that this should introduce some kind of bias towards smaller numbers.

推荐答案

不,对于 n的大多数值,结果分布不会完全一致。对于较小的值,它将非常接近均匀,以至于您很难检测到均匀分布的任何差异,但随着 n 变大,偏差可能变为值得注意的。

No, the resulting distribution is not going to be perfectly uniform, for most values of n. For small values, it'll be so close to uniform that you'd have a hard time detecting any difference from a uniform distribution, but as n gets larger the bias can become noticeable.

为了说明,这里有一些Python代码(不是J​​avaScript,对不起,但原理是一样的):

To illustrate, here's some Python code (not JavaScript, sorry, but the principle is the same):

from collections import Counter
from random import random

def badrand(n):
    return int(random() * n)

print(Counter(badrand(6755399441055744) % 3 for _ in range(10000000)))

这产生1000万个随机整数,范围在 [0,6755399441055744),减少每个模数为3的整数,并计算次数余数是0,1或2.如果我们统一生成这些整数,我们期望模3的余数大致均匀分布,所以我们期望计数相似。

This is producing 10 million random integers in the range [0, 6755399441055744), reducing each of those integers modulo 3, and counting the number of times the remainder is 0, 1, or 2. If we're generating those integers uniformly, we'd expect the remainders modulo 3 to be roughly evenly distributed, so we'd expect the counts to be similar.

以下是运行的示例结果这在我的机器上:

Here's an example result from running this on my machine:

Counter({1: 3751915, 0: 3334643, 2: 2913442})

也就是说, 1 的剩余部分显着 0 更容易发生,反过来比 2 的剩余部分更可能发生。这里的差异是方式太大而无法通过随机变化来解释。

That is, a remainder of 1 is significantly more likely to occur than 0, which in turn is significantly more likely to occur than a remainder of 2. The differences here are way too big to be explained by random variation.

那么出了什么问题? Python的 random()函数质量相对较高,基于 Mersenne Twister ,因此我们不太可能看到基本随机数生成器导致的统计问题。发生的事情是 random()生成2 ^ 53(大致)同样可能的结果之一 - 每个结果都是 x / 2形式的数字^ 53 对于 [0,2 ^ 53] 范围内的某个整数 x 。现在在 badrand 调用中,我们有效地将这些结果映射到 6755399441055744 可能的输出。现在这个价值没有随意选择(哈!);它恰好是2 ^ 53的3/4。这意味着在可能的最均匀分布下,可能的 badrand 输出值的2/3正被2 ^ 53个可能中的一个击中random()输出值,而另外1/3被 2 击中2 ^ 53可能 random()输出值。也就是说,一些潜在的输出两次可能与其他输出一样。所以我们离制服还有很长的路要走。

So what went wrong? Python's random() function is relatively high quality, based on the Mersenne Twister, so we're unlikely to be seeing statistical problems resulting from the base random number generator. What's happening is that random() generates one of 2^53 (roughly) equally likely outcomes - each outcome is a number of the form x / 2^53 for some integer x in the range [0, 2^53). Now in the badrand call, we're effectively mapping those outcomes to 6755399441055744 possible outputs. Now that value wasn't chosen at random (ha!); it's exactly 3/4 of 2^53. That means that under the most uniform distribution possible, 2/3 of the possible badrand output values are being hit by exactly one of the 2^53 possible random() output values, while the other 1/3 are being hit by two of the 2^53 possible random() output values. That is, some of the potential outputs are twice as likely to occur as others. So we're a long way from uniform.

你会在JavaScript中看到同样的效果。对于Chrome,似乎只有2 ^ 32个不同的结果来自 Math.random(),因此您应该能够使用 n 查找上述效果小于(但接近)2 ^ 32。

You're going to see the same effect in JavaScript. In the case of Chrome, it appears that there are only 2^32 distinct results from Math.random(), so you should be able to find effects like the above with n smaller than (but close to) 2^32.

当然,对于小 n ,效果相同,也是:如果 n = 5 ,那么因为 5 不是 2 ^ 32的除数我们无法完全均匀地分发所有 2 ^ 32 可能 Math.random() 5个期望结果之间的结果:我们所希望的最好结果是5个结果中的4个出现在858993459的可能的 random()结果中,而第五个结果发生对于 random()结果的858993460。但是,这种分布将非常接近统一,以至于几乎不可能找到任何统计测试来告诉你不同的东西。所以出于实际目的,你应该安全地使用小的 n

Of course, the same effect holds for small n, too: if n = 5, then because 5 is not a divisor of 2^32 there's no way we can perfectly evenly distribute all 2^32 possible Math.random() results between the 5 desired outcomes: the best we can hope for is that 4 of the 5 outcomes appear for 858993459 of the possible random() results each, while the fifth occurs for 858993460 of the random() results. But that distribution is going to be so close to uniform that it would be well-nigh impossible to find any statistical test to tell you differently. So for practical purposes, you should be safe with small n.

有一个相关的Python bug可能很有趣在 http://bugs.python.org/issue9025 。通过远离 int(random()* n)计算这些数字的方法,Python 3解决了这个问题。尽管如此,这个bug仍然在Python 2中保留

There's a related Python bug that might be interesting at http://bugs.python.org/issue9025. That bug was solved for Python 3 by moving away from the int(random() * n) method of computing these numbers. The bug still remains in Python 2, though.

这篇关于使用浮点源统一分布整数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆