在Monte Carlo模拟中避免基本的rand()偏差? [英] Avoid basic rand() bias in Monte Carlo simulation?

查看:124
本文介绍了在Monte Carlo模拟中避免基本的rand()偏差?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用Objective C重写C中的蒙特卡洛模拟,以用于VBA / Excel中的dll。计算中的引擎是创建0到10001之间的随机数,并将其与5000-7000邻域中的变量进行比较。每次迭代使用4-800次,而我使用100000次迭代。因此,每次运行大约有50.000.000代的随机数。

I am rewriting a monte carlo simulation in C from Objective C to use in a dll from VBA/Excel. The "engine" in the calculation is the creation of a random number between 0 and 10001 that is compared to a variable in the 5000-7000 neighbourhood. This is used 4-800 times per iteration and I use 100000 iterations. So that is about 50.000.000 generations of random numbers per run.

在Objective C中,测试没有偏见,但是C代码有很多问题。目标C是C的超集,因此95%的代码是复制粘贴的,很难搞清楚。我昨天和今天整天都经历了很多次,但没有发现任何问题。

While in Objective C the tests showed no bias, I have huge problems with the C code. Objective C is a superset of C, so 95% of the code was copy paste and hard to screw up. I have gone through the rest many times all day yesterday and today and I have found no problems.

我留下了arc4random_uniform()和rand()之间的区别,使用srand(),尤其是因为偏向0到10000的较低数字。我进行的测试与偏向0.5到2%的低于5000左右的数字是一致的。任何其他解释是如果我代码避免了重复,但我猜想是不会的。

I am left with the difference between arc4random_uniform() and rand() with the use of srand(), especially because a bias towards the lower numbers of 0 to 10000. The test I have conducted is consistent with such a bias of .5 to 2 % towards numbers below circa 5000. The any other explanation is if my code avoided repeats which I guess it doesn´t do.

代码真的很简单( spiller1evne和 spiller2evne是介于5500和6500之间的数字):

the code is really simple ("spiller1evne" and "spiller2evne" being a number between 5500 and 6500):

srand((unsigned)time(NULL));
for (j=0;j<antala;++j){
[..]
        for (i=1;i<450;i++){
            chance = (rand() % 10001);

[..]

             if (grey==1) {


                 if (chance < spiller1evnea) vinder = 1;
                 else vinder = 2;
            }
            else{
                if (chance < spiller2evnea) vinder = 2;
                else vinder = 1;
            }

现在我不需要真正的随机性,伪随机性还不错。我只需要大约均匀地分布它(就像5555出现的可能性是5556的两倍就没关系。5500-5599是否比5600-5699的可能性高5%并不重要,如果对0-4000的明显偏差是0.5-2%,而不是6000-9999。

Now I don´t need true randomness, pseudorandomness is quite fine. I only need it to be approximatly even distributed on a cummulative basis (like it doesn´t matter much if 5555 is twice as likely to come out as 5556. It does matter if 5500-5599 is 5% more likely as 5600-5699 and if there is a clear 0.5-2% bias towards 0-4000 than 6000-9999.

首先,rand()是我的问题听起来是否合理,是否存在可以满足我的低需求的简单实现吗?

First, does it sound plausible that rand() is my problem and Is there an easy implementation that meets my low needs?

编辑:如果我的怀疑合理,我可以在此使用任何东西:

if my suspicion is plausible, could I use any on this:

http://www.azillionmonkeys.com/qed/random.html

我能复制粘贴此内容作为替换吗(我用C语言编写,并且使用Visual Studio,真的是新手)?

Would I be able to just copy paste this in as a replacement (I am writing in C and using Visual Studio, really novice)?:

#include <stdlib.h>

#define RS_SCALE (1.0 / (1.0 + RAND_MAX))

double drand (void) {
    double d;
    do {
       d = (((rand () * RS_SCALE) + rand ()) * RS_SCALE + rand ()) * RS_SCALE;
    } while (d >= 1); /* Round off */
    return d;
}

#define irand(x) ((unsigned int) ((x) * drand ()))

编辑2:显然,上面的代码在没有相同偏见的情况下有效,因此,对于那些需要与中间路线相同的人,我建议这样做我在上面描述了。它确实会受到惩罚,因为它会调用rand()3次。所以我仍在寻找一种更快的解决方案。

Well clearly the above code works without the same bias so I would this be a recommendation for anyone who have the same "middle-of-the-road"-need as I described above. It does come with a penalty as it calls rand() three times. So I am still looking for a faster solution.

推荐答案

rand()函数会在范围[0, RAND_MAX ]中生成 int 。如果像原始代码那样通过模数运算符()将其转换为其他范围,则除非目标范围的大小恰好等于平均除 RAND_MAX + 1

The rand() function generates an int in the range [0, RAND_MAX]. If you convert this to a different range via the modulus operator (%), as your original code does, then that introduces non-uniformity unless the size of your target range happens to evenly divide RAND_MAX + 1. That sounds like exactly what you see.

您有多种选择,但是如果您想坚持使用 rand(),那么我建议您采用原来的方法:

You have multiple options, but if you want to stick with something based on rand() then I suggest this variation on your original approach:

/*
 * Returns a pseudo-random int selected from the uniform distribution
 * over the half-open interval [0, limit), provided that limit does not
 * exceed RAND_MAX.
 */
int range_rand(int limit) {
    int rand_bound = (RAND_MAX / limit) * limit;
    int r;
    while ((r = rand()) >= rand_bound) { /* empty */ }
    return r % limit;
}

尽管原则上 rand()的数量对该函数的每次调用都会无限制地进行调用,实际上,对于较小的 limit 值,平均调用次数仅略大于1,并且对于每个限额值,平均值小于2。它从[0, RAND_MAX ]的子集中选择初始随机数,消除了前面所述的不均匀性,该子集的大小除以限制

Although in principle the number of rand() calls each call to that function will generate is unbounded, in practice the average number of calls is only slightly greater than 1 for relatively small limit values, and the average is less than 2 for every limit value. It removes the non-uniformity I described earlier by choosing the initial random number from a subset of [0, RAND_MAX] whose size is evenly divided by the limit.

这篇关于在Monte Carlo模拟中避免基本的rand()偏差?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆