如何“检查”如果一个函数真的给出了一个随机结果? [英] How to "check" if a function really gives a random result?

查看:101
本文介绍了如何“检查”如果一个函数真的给出了一个随机结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何确定一个函数真的是随机的或尽可能接近这个概念?另外,随机和伪随机之间有什么区别?最后,什么算法/来源可以用来产生随机数字?

PS:同样问这是因为一个MySQL语句使用 ORDER BY RAND()

解决方案

Aloha!



有几种测试随机性的方法和工具。这些应用于从发电机收集的一组数字中进行测试。也就是说,你根据一组生成的数据测试生成器

在计算中,我们通常希望拥有IT安全性一个符合统一随机过程的发生器。有许多不同的过程,但我猜测这是一个你想要的统一过程。



NIST发布了几个文档,其中包含伪随机数生成器以及如何测试它们。看看NIST文件SP 800-22和SP 800-20。正如其他人指出的那样。

如果你想要一个真随机数发生器(TRNG),你需要收集物理熵。这种来源的例子有放射性衰变,宇宙辐射,熔岩灯等。最好你想要难以操纵的来源。 IETF的RFC有一些很好的建议,请参阅RFC 4086 - 安全随机性来源:
http ://tools.ietf.org/html/rfc4086



你通常做的是从一个或多个(最好是多于一个)源收集熵。所收集的数据然后被过滤(白化),并最终用于周期性地播种良好的PRNG。自然地用不同的种子。

这是大多数现代好随机发生器的工作原理。提供使用诸如对称密码(例如AES)或散列函数的密码基元创建的PRNG的熵收集器。例如,参见Schneier的随机生成器Yarrow / Fortuna,它在FreeBSD中使用了修改后的形式。回到关于测试的问题。正如有人指出,马萨利亚已经产生了一套很好的测试,并在DIEHARD测试中进行了编码。现在在Dieharder测试中还有更多的测试:
http://www.phy.duke.edu/~rgb/General/dieharder.php

Dieharder是一个很好的工具,它会给你带来好处我们相信,提供给它的大量数据(从您的发电机收集)是随机的(质量好)。运行Dieharder很容易,但需要一些时间。



随机性的原位测试很难。您通常不希望在您的系统中实施Dieharder。你可以做的是实现一些简单的检测器,应该检测patholigical案件。我通常会建议:


  • 等值长度。一个简单的计数器,只要RNG生成的两个连续值不同,就会重置。然后,当您认为柜台显示RNG已损坏时,您需要定义一个阈值。如果您看到1000万个相等的值,并且值空间大于一个值(您看到的那个值),那么您的RNG可能无法正常工作。如果值正在查看是Esp,则为边缘值之一。例如0x00000 ....或0xfffff ...


  • 中值。如果您在生成了一百万个值并且具有均匀分布后,其中值很大程度上偏向其中一个值空间边缘,而不是接近中间值,那么someting可能也是错误的。


  • 差异。如果你在产生了数百万个值之后没有看到接近值空间的MIN和MAX的值,而是有一个狭窄的生成值空间,那么某些东西也是不对的。




最后。既然你希望使用一个好的PRNG(例如基于AES),那么建议的原位测试可能会被应用于熵源。



我希望有帮助在某些方面。


How can one be sure that a function is really random or as close to the notion as possible? Also, what is the distinction between random and pseudo-random? Finally, what algorithms/sources can be used to generate random numbers?

P.S: Also asking this because a MySQL statement using ORDER BY RAND() LIMIT 1 isn't giving convincing results.

解决方案

Aloha!

There are several methods and tools for testing for randomness. These are applied on a set of numbers collected from the generator to be tested. That is, you test the generator based on a set of data generated.

In computing, esp for IT-security we normally want to have a generator that conforms to a uniform random process. There are many different processes, but I'm guessing that it is a uniform process you are aiming for.

NIST has published several documents with recommendations on both pseudo random number generators as well how to test them. Look at NIST documents SP 800-22 and SP 800-20.

As somebody else pointed out. If you want a True Random Number Generator (TRNG) you need to gather physical entropy. Examples of such sources are radioactive decay, cosmic radiation, lava lamps etc. Preferably you want sources that are hard to manipulate. IETF has an RFC that have some good recommendations, see RFC 4086 - Source of Randomness for Security: http://tools.ietf.org/html/rfc4086

What you normally do is to collect entropy from one ore more (preferably more than one) source. The collected data is then filtered (whitening) and finally used to periodically seed a good PRNG. With different seeds, naturally.

This is how most modern good random generators works. An entropy collector feeding a PRNG created using cryptographic primitives such as symmetric ciphers (AES for example) or hash functions. See for example the random generator Yarrow/Fortuna by Schneier, which in modified form is used in FreeBSD.

Coming back to your question about testing. As somebody pointed out Marsaglia have produced a good set of tests, which was codified in the DIEHARD tests. There are now an even more exapnded set of tests in the Dieharder tests: http://www.phy.duke.edu/~rgb/General/dieharder.php

Dieharder is a good tool that will give you good confidence that the huge pile of numbers supplied to it (collected from your generator) is random (with good quality) or not. Running Dieharder is easy, but will take some time.

In situ testing of randomness is hard. You don't normally want to implement Dieharder in your system. What you can do is implement some simple detectors that should detect patholigical cases. I usually suggest:

  • Equal value length. A simple counter that is reset whenever two consequtive values generated by the RNG differs. And then you need to define a threshold when you think the counter shows that the RNG is broken. If you see 10 million equal values and the value space is greater that one value (the one you see) your RNG is probably not working all that well. Esp if the value are seeing is one of the edge values. For example 0x00000.... or 0xfffff...

  • Median value. If you after generating a million values and have a uniform distribution have a median value that is heavily leaning towards one of the value space edges, not close to the middle, someting is probably also amiss.

  • Variance. If you after generating million of values haven't seen values close to the MIN and MAX of the value space, but instead have a narrow generated value space, then something is also amiss.

Finally. Since you hopefully are using a good PRNG (based on AES for example), the in situ-tests suggested might instead be applied on the entropy source.

I hope that helped in some ways.

这篇关于如何“检查”如果一个函数真的给出了一个随机结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆