为什么rand()在Linux上比Mac重复数字的频率要高得多? [英] Why does rand() repeat numbers far more often on Linux than Mac?

查看:129
本文介绍了为什么rand()在Linux上比Mac重复数字的频率要高得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在C中实现一个哈希图,这是我正在研究的项目的一部分,并使用随机插入来对其进行测试.我注意到,Linux上的rand()似乎比Mac上的重复频率更高.在两个平台上,RAND_MAX均为2147483647/0x7FFFFFFF.我将其简化为该测试程序,该程序使字节数组RAND_MAX+1长,生成RAND_MAX随机数,记录每个数字是否重复,并从列表中将其检出.

I was implementing a hashmap in C as part of a project I'm working on and using random inserts to test it. I noticed that rand() on Linux seems to repeat numbers far more often than on Mac. RAND_MAX is 2147483647/0x7FFFFFFF on both platforms. I've reduced it to this test program that makes a byte array RAND_MAX+1-long, generates RAND_MAX random numbers, notes if each is a duplicate, and checks it off the list as seen.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main() {
    size_t size = ((size_t)RAND_MAX) + 1;
    char *randoms = calloc(size, sizeof(char));
    int dups = 0;
    srand(time(0));
    for (int i = 0; i < RAND_MAX; i++) {
        int r = rand();
        if (randoms[r]) {
            // printf("duplicate at %d\n", r);
            dups++;
        }
        randoms[r] = 1;
    }
    printf("duplicates: %d\n", dups);
}

Linux始终生成约7.9亿个重复项. Mac始终只生成一个,因此它循环遍历几乎可以生成 的每个随机数,而无需重复.谁能告诉我这是如何工作的?我无法说出与man页不同的任何内容,无法分辨每个正在使用的RNG,也无法在线找到任何内容.谢谢!

Linux consistently generates around 790 million duplicates. Mac consistently only generates one, so it loops through every random number that it can generate almost without repeating. Can anyone please explain to me how this works? I can't tell anything different from the man pages, can't tell which RNG each is using, and can't find anything online. Thanks!

推荐答案

一开始听起来好像macOS rand()对于不重复任何数字都更好,但应该注意的是,生成的数字数量是期望,可以看到很多重复项(实际上,大约有7.9亿,或者(2 31 -1)/ e ).同样,依次遍历数字也不会产生重复,但不会被认为是非常随机的.因此,在本测试中,Linux rand()实现 与真正的随机源是无法区分的,而macOS rand()并非如此.

While at first it may sound like the macOS rand() is somehow better for not repeating any numbers, one should note that with this amount of numbers generated it is expected to see plenty of duplicates (in fact, around 790 million, or (231-1)/e). Likewise iterating through the numbers in sequence would also produce no duplicates, but wouldn't be considered very random. So the Linux rand() implementation is in this test indistinguishable from a true random source, whereas the macOS rand() is not.

乍一看似乎令人惊讶的另一件事是macOS rand()如何能够很好地避免重复.查看其源代码,我们找到实现如下:

Another thing that appears surprising at first glance is how the macOS rand() can manage to avoid duplicates so well. Looking at its source code, we find the implementation to be as follows:

/*
 * Compute x = (7^5 * x) mod (2^31 - 1)
 * without overflowing 31 bits:
 *      (2^31 - 1) = 127773 * (7^5) + 2836
 * From "Random number generators: good ones are hard to find",
 * Park and Miller, Communications of the ACM, vol. 31, no. 10,
 * October 1988, p. 1195.
 */
    long hi, lo, x;

    /* Can't be initialized with 0, so use another value. */
    if (*ctx == 0)
        *ctx = 123459876;
    hi = *ctx / 127773;
    lo = *ctx % 127773;
    x = 16807 * lo - 2836 * hi;
    if (x < 0)
        x += 0x7fffffff;
    return ((*ctx = x) % ((unsigned long) RAND_MAX + 1));

在序列再次重复之前,确实确实会导致1到RAND_MAX之间的所有数字恰好包含一次.由于下一个状态基于乘法,因此该状态永远不能为零(否则所有将来的状态也将为零).因此,您看到的重复数字是第一个,而零是永不返回的数字.

This does indeed result in all numbers between 1 and RAND_MAX, inclusive, exactly once, before the sequence repeats again. Since the next state is based on multiplication, the state can never be zero (or all future states would also be zero). Thus the repeated number you see is the first one, and zero is the one that is never returned.

至少在存在macOS(或OS X)的情况下,Apple一直在其文档和示例中一直推广使用更好的随机数生成器,因此rand()的质量可能并不重要,因此我只是停留在可用的最简单的伪随机发生器之一上. (正如您所提到的,他们的rand()甚至被建议使用arc4random()来注释.)

Apple has been promoting the use of better random number generators in their documentation and examples for at least as long as macOS (or OS X) has existed, so the quality of rand() is probably not deemed important, and they've just stuck with one of the simplest pseudorandom generators available. (As you noted, their rand() is even commented with a recommendation to use arc4random() instead.)

在相关说明中,我可以找到的最简单的伪随机数生成器

On a related note, the simplest pseudorandom number generator I could find that produces decent results in this (and many other) tests for randomness is xorshift*:

uint64_t x = *ctx;
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
*ctx = x;
return (x * 0x2545F4914F6CDD1DUL) >> 33;

此实现导致测试中几乎有7.9亿重复.

This implementation results in almost exactly 790 million duplicates in your test.

这篇关于为什么rand()在Linux上比Mac重复数字的频率要高得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆