C中的随机整数,与整数算术相比,rand()%N有多糟糕?它有什么缺点? [英] Random integers in C, how bad is rand()%N compared to integer arithmetic? What are its flaws?

查看:97
本文介绍了C中的随机整数,与整数算术相比,rand()%N有多糟糕?它有什么缺点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是:rand()%N被认为是非常糟糕的,而使用整数算术被认为是更好的选择,但是我看不到两者之间的区别.

My question is: rand()%N is considered very bad, whereas the use of integer arithmetic is considered superior, but I cannot see the difference between the two.

人们总是提到:

  • rand()%N

rand()%N非常容易预测,

您可以将其用于游戏,但不能用于密码学

you can use it for games but not for cryptography

有人可以解释这里是否有这些观点以及如何看待吗?

Can someone explain if any of these points are the case here and how to see that?

低位非随机性的想法应该使我展示的两种情况的PE有所不同,但事实并非如此.

The idea of the non-randomness of the lower bits is something that should make the PE of the two cases that I show differ, but it's not the case.

我想许多像我这样的人总是会避免使用rand()rand()%N,因为我们一直被告知它非常糟糕.我很好奇看到用c rand()%N生成的错误"随机整数是如何有效的.这也是Ryan Reich在如何从一定范围内生成随机整数.

I guess many like me would always avoid using rand(), or rand()%N because we've been always taught that it is pretty bad. I was curious to see how "wrong" random integers generated with c rand()%N effectively are. This is also a follow up to Ryan Reich's answer in How to generate a random integer number from within a range.

老实说,那里的解释听起来很有说服力;不过,我想尝试一下.因此,我以非常幼稚的方式比较分布.我为不同数量的样本和域运行两个随机生成器.我没有看到要计算密度而不是直方图的要点,因此我只是计算了直方图,仅通过观察,我就可以说它们看上去都一样.关于提出的另一点,关于实际随机性(尽管均匀分布).我-天真地-计算了这些运行的置换熵,这对于两个样本集都是相同的,这告诉我们,关于事件的顺序,两者之间没有区别.

The explanation there sounds very convincing, to be honest; nevertheless, I thought I’d give it a try. So, I compare the distributions in a VERY naive way. I run both random generators for different numbers of samples and domains. I didn't see the point of computing a density instead of histograms, so I just computed histograms and, just by looking, I would say they both look just as uniform. Regarding the other point that was raised, about the actual randomness (despite being uniformly distributed). I — again naively —compute the permutation entropy for these runs, which are the same for both sample sets, which tell us that there's no difference between both regarding the ordering of the occurrence.

因此,在很多情况下,在我看来rand()%N都很好,我们怎么能看到它们的缺点?

So, for many purposes, it seems to me that rand()%N would be just fine, how can we see their flaws?

在这里,我向您展示了一种非常简单,低效且不是很优雅(但我认为是正确的)的方式来计算这些样本,并获得直方图和置换熵. 我显示了{5,10,25,50,100}中具有i的域(0,i)与i的图,用于不同数量的样本:

Here I show you a very simple, inefficient and not very elegant (but I think correct) way of computing these samples and get the histograms together with the permutation entropies. I show plots for domains (0,i) with i in {5,10,25,50,100} for different number of samples:

我猜在代码中没什么可看的,所以我将保留C语言和matlab代码以用于复制.

There's not much to see in the code I guess, so I will leave both the C and the matlab code for replication purposes.

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

int main(int argc, char *argv[]){
        unsigned long max = atoi(argv[2]);
        int samples=atoi(argv[3]);
        srand(time(NULL));
        if(atoi(argv[1])==1){
                for(int i=0;i<samples;++i)
                        printf("%ld\n",rand()%(max+1));

        }else{
                for(int i=0;i<samples;++i){
                        unsigned long
                        num_bins = (unsigned long) max + 1,
                        num_rand = (unsigned long) RAND_MAX + 1,
                        bin_size = num_rand / num_bins,
                        defect   = num_rand % num_bins;

                        long x;
                        do {
                                x = rand();
                        }
                        while (num_rand - defect <= (unsigned long)x);
                        printf("%ld\n",x/bin_size);
                }
        }
        return 0;
}

这是Matlab代码,用于对此进行绘制并计算PE(我从以下位置获取的排列的递归: https://www.mathworks.com/matlabcentral/answers/308255-如何生成-所有可能的排列,而无需使用功能-perms-randperm ):

And here is the Matlab code to plot this and compute the PEs (the recursion for the permutations I took it from: https://www.mathworks.com/matlabcentral/answers/308255-how-to-generate-all-possible-permutations-without-using-the-function-perms-randperm):

system('gcc randomTest.c -o randomTest.exe;');
max = 100;
samples = max*10000;
trials = 200;
system(['./randomTest.exe 1 ' num2str(max) ' ' num2str(samples) ' > file1'])
system(['./randomTest.exe 2 ' num2str(max) ' ' num2str(samples) ' > file2'])
a1=load('file1');
a2=load('file2');
uni = figure(1);
title(['Samples: ' num2str(samples)])
subplot(1,3,1)
h1 = histogram(a1,max+1);
title('rand%(max+1)')
subplot(1,3,2)
h2 = histogram(a2,max+1);
title('Integer arithmetic')
as=[a1,a2];
ns=3:8;
H = nan(numel(ns),size(as,2));
for op=1:size(as,2)
    x = as(:,op);
    for n=ns
        sequenceOcurrence = zeros(1,factorial(n));
        sequences = myperms(1:n);
        sequencesArrayIdx = sum(sequences.*10.^(size(sequences,2)-1:-1:0),2);
        for i=1:numel(x)-n
            [~,sequenceOrder] = sort(x(i:i+n-1));
            out = sequenceOrder'*10.^(numel(sequenceOrder)-1:-1:0).';
            sequenceOcurrence(sequencesArrayIdx == out) = sequenceOcurrence(sequencesArrayIdx == out) + 1;
        end
        chunks = length(x) - n + 1;
        ps = sequenceOcurrence/chunks;
        hh = sum(ps(logical(ps)).*log2(ps(logical(ps))));
        H(n,op) = hh/log2(factorial(n));
    end
end
subplot(1,3,3)
plot(ns,H(ns,:),'--*','linewidth',2)
ylabel('PE')
xlabel('Sequence length')
filename = ['all_' num2str(max) '_' num2str(samples) ];
export_fig(filename)

推荐答案

由于模运算的工作原理是,如果N比RAND_MAX大,那么N会有效,因此%N会使它更有可能获得一些值.想象RAND_MAX为12,N为9.如果分布良好,则获得0、1或2之一的机会为0.5,获得3、4、5、6、7、8之一的机会为0.5.结果是,您获得0而不是4的可能性是两倍.如果N是RAND_MAX的精确除数,则不会发生此分配问题,并且如果N与RAND_MAX相比很小,则该问题将变得不那么明显. RAND_MAX的值可能不是一个特别大的值(可能是2 ^ 15-1),这使此问题比您预期的更严重.另一种选择(rand() * n) / (RAND_MAX + 1)的方法也不会给出均匀的分布,但是,每个m值(对于某些m)将更可能出现,而不是更可能的值都位于分布的低端.

Due to the way modulo arithmetic works if N is significant compared to RAND_MAX doing %N will make it so you're considerably more likely to get some values than others. Imagine RAND_MAX is 12, and N is 9. If the distribution is good then the chances of getting one of 0, 1, or 2 is 0.5, and the chances of getting one of 3, 4, 5, 6, 7, 8 is 0.5. The result being that you're twice as likely to get a 0 instead of a 4. If N is an exact divider of RAND_MAX this distribution problem doesn't happen, and if N is very small compared to RAND_MAX the issue becomes less noticeable. RAND_MAX may not be a particularly large value (maybe 2^15 - 1), making this problem worse than you may expect. The alternative of doing (rand() * n) / (RAND_MAX + 1) also doesn't give an even distribution, however, it will be every mth value (for some m) that will be more likely to occur rather than the more likely values all being at the low end of the distribution.

如果N为RAND_MAX的75%,则分布的底部三分之一的值是顶部三分之二的值的两倍(因为这是附加值映射的位置)

If N is 75% of RAND_MAX then the values in the bottom third of your distribution are twice as likely as the values in the top two thirds (as this is where the extra values map to)

rand()的质量将取决于您所使用的系统的实现.我相信某些系统的执行情况很差,OS X的手册页宣称rand作废. Debian手册页显示以下内容:

The quality of rand() will depend on the implementation of the system that you're on. I believe that some systems have had very poor implementation, OS Xs man pages declare rand obsolete. The Debian man page says the following:

Linux C库中的rand()和srand()版本使用相同的 随机数生成器为random(3)和srandom(3),因此低阶 这些位应与高阶位一样随机.但是,在较老的 rand()实现,以及关于当前实现的不同 系统中,低阶位的随机性比高阶位的随机性小得多. 顺序位.请勿在打算用于以下用途的应用程序中使用此功能 需要良好的随机性时可移植. (改为使用random(3).)

The versions of rand() and srand() in the Linux C Library use the same random number generator as random(3) and srandom(3), so the lower-order bits should be as random as the higher-order bits. However, on older rand() implementations, and on current implementations on different systems, the lower-order bits are much less random than the higher- order bits. Do not use this function in applications intended to be portable when good randomness is needed. (Use random(3) instead.)

这篇关于C中的随机整数,与整数算术相比,rand()%N有多糟糕?它有什么缺点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆