Pythons random.randint 在统计上是随机的吗? [英] Is Pythons random.randint statistically random?

查看:31
本文介绍了Pythons random.randint 在统计上是随机的吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我正在测试一个游戏的某些掷骰子的概率.如果滚动一个 10 面骰子的基本情况.

So I'm testing an calculating the probabilities of certain dice rolls, for a game. The base case if that rolling one 10sided die.

我做了一百万个样本,最终得到以下比例:

I did a million samples of this, and ended up with the following proportions:

Result
0       0.000000000000000%
1       10.038789961210000%
2       10.043589956410000%
3       9.994890005110000%
4       10.025289974710000%
5       9.948090051909950%
6       9.965590034409970%
7       9.990190009809990%
8       9.985490014509990%
9       9.980390019609980%
10      10.027589972410000%

这些当然都应该是 10%.这些结果的标准偏差为 0.0323207%.在我看来,这似乎相当高.这只是巧合吗?据我了解,随机模块访问正确的伪随机数.即来自通过统计测试的方法是随机的.还是这些伪伪随机数生成器

These should of course all be 10%. There is a standard deviation of 0.0323207% in these results. that, to me, seems rather high. Is it just coincidence? As I understand it the random module accesses proper pseudo-random numbers. Ie ones from a method that pass the statistical tests to be random. Or are these pseudo-pseudo-random number generators

我应该使用加密伪随机数生成器吗?我很确定我不需要 true 随机数生成器(参见 http://www.random.org/, http://en.wikipedia.org/wiki/Hardware_random_number_generator).

Should I be using cryptographic pseudo-random number generators? I'm fairly sure I don't need a true random number generator (see http://www.random.org/, http://en.wikipedia.org/wiki/Hardware_random_number_generator).

我目前正在用 10 亿个样本重新生成所有结果,(为什么不呢,我有一个松脆的服务器供我使用,还有一些睡眠要做)

I am currently regenerating all my results with 1 billion samples, (cos why not, I have a crunchy server at my disposal, and some sleep to do)

推荐答案

Martijn 的回答是对 Python 可以访问的随机数生成器的非常简洁的回顾.

Martijn's answer is a pretty succinct review of the random number generators that Python has access to.

如果您想查看生成的伪随机数据的属性,请从 random.zip="nofollow">http://www.fourmilab.ch/random/,并在大量随机数据样本上运行它.尤其是χ²(卡方)检验对随机性非常敏感.对于真正随机的序列,χ² 检验的百分比应该在 10% 到 90% 之间.

If you want to check out the properties of the generated pseudo-random data, download random.zip from http://www.fourmilab.ch/random/, and run it on a big sample of random data. Especially the χ² (chi squared) test is very sensitive to randomness. For a sequence to be really random, the percentage from the χ² test should be between 10% and 90%.

对于游戏,我猜 Python 内部使用的 Mersenne Twister 应该足够随机(除非您正在构建在线赌场:-).

For a game I'd guess that the Mersenne Twister that Python uses internally should be sufficiently random (unless you're building an online casino :-).

如果您想要随机性,并且如果您使用的是 Linux,则可以从 /dev/random 读取.这只会从内核的熵池中产生随机数据(从中断到达的不可预测的时间收集),所以如果你用完它就会阻塞.该熵用于初始化(种子)由 /dev/urandom 使用的 PRNG.在 FreeBSD 上,为 /dev/random 提供数据的 PRNG 使用 Yarrow 算法,该算法通常被认为是加密安全的.

If you want pure randomness, and if you are using Linux, you can read from /dev/random. This only produces random data from the kernel's entropy pool (which is gathered from the unpredictable times that interrupts arrive), so it will block if you exhaust it. This entropy is used to initialize (seed) the PRNG used by /dev/urandom. On FreeBSD, the PRNG that supplies data for /dev/random uses the Yarrow algorithm, which is generally regarded as being cryptographically secure.

我对来自 random.randint 的字节进行了一些测试.首先创建一百万个随机字节:

I ran some tests on bytes from random.randint. First creating a million random bytes:

import random
ba = bytearray([random.randint(0,255) for n in xrange(1000000)])
with open('randint.dat', 'w+') as f:
    f.write(ba)

然后我运行 Fourmilabent 程序:

Then I ran the ent program from Fourmilab on it:

Entropy = 7.999840 bits per byte.

Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.

Chi square distribution for 1000000 samples is 221.87, and randomly
would exceed this value 93.40 percent of the times.

Arithmetic mean value of data bytes is 127.5136 (127.5 = random).
Monte Carlo value for Pi is 3.139644559 (error 0.06 percent).
Serial correlation coefficient is -0.000931 (totally uncorrelated = 0.0).

现在对于 χ² 检验,您离 50% 越远,数据越可疑.如果一个人非常挑剔,则小于 10% 或 >90% 的值被认为是不可接受的.ent 的作者 John Walker 称这个值几乎是可疑的".

Now for the χ² test, the further you get from 50%, the more suspect the data is. If one is very fussy, values <10% or >90% are deemed unacceptable. John Walker, author of ent calls this value "almost suspect".

相比之下,这里是我之前运行的 FreeBSD 的 Yarrow prng 对 10 MiB 的相同分析:

As a contrast, here is the same analysis of 10 MiB from FreeBSD's Yarrow prng that I ran earlier:

Entropy = 7.999982 bits per byte.

Optimum compression would reduce the size
of this 10485760 byte file by 0 percent.

Chi square distribution for 10485760 samples is 259.03, and randomly
would exceed this value 41.80 percent of the times.

Arithmetic mean value of data bytes is 127.5116 (127.5 = random).
Monte Carlo value for Pi is 3.139877754 (error 0.05 percent).
Serial correlation coefficient is -0.000296 (totally uncorrelated = 0.0).

虽然其他数据似乎没有太大差异,但 χ² 百分比接近 50%.

While there seems not much difference in the other data, the χ² precentage is much closer to 50%.

这篇关于Pythons random.randint 在统计上是随机的吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆