了解Matlab中的伪随机数如何暗示统计独立性 [英] Understanding how pseudo random numbers in Matlab imply statistical independence

查看:128
本文介绍了了解Matlab中的伪随机数如何暗示统计独立性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑以下Matlab代码,其中我使用伪随机数生成器生成了一些数据. 我希望您能从统计学的角度(以我在下面解释的术语)帮助您理解这些数字的随机性".

Consider the following Matlab code in which I generate some data using pseudo random number generator. I would like your help to understand "how" random are these numbers from a statistical point of view, in the terms I explain below.

我首先设置一些参数

%%%%%%%%Parameters
clear
rng default
Xsup=-1:6; 
Zsup=1:10; 
n_m=200; 
n_w=200; 
R=n_m;

然后我生成数据

%%%%%%%%Creation of data [XZ,etapair,zetapair,etasingle,zetasingle]

%Vector X of dimension n_mx1
idX=randi(size(Xsup,2),n_m,1); %n_mx1
X=Xsup(idX).'; %n_mx1

%Vector Z of dimension n_wx1
idZ=randi(size(Zsup,2),n_w,1); 
Z=Zsup(idZ).'; %n_wx1

%Combine X and Z in a matrix XZ of dimension (n_m*n_w)x2 
which lists all possible combinations of values in X and Z
[cX, cZ] = ndgrid(X,Z);
XZ = [cX(:), cZ(:)]; %(n_m*n_w)x2

%Vector etapair of dimension (n_m*n_w)x1
etapair=randn(n_m*n_w,1); %(n_m*n_w)x1

%Vector zetapair of dimension (n_m*n_w)x1
zetapair=randn(n_m*n_w,1); %(n_m*n_w)x1

%Vector etasingle of dimension (n_m*n_w)x1
etasingle=max(randn(n_m,R),[],2); %n_mx1 
etasingle=repmat(etasingle, n_w,1); %(n_m*n_w)x1

%Vector zetasingle of dimension (n_m*n_w)x1
zetasingle=max(randn(n_w,R),[],2); %n_wx1
zetasingle=kron(zetasingle, ones(n_m,1)); %(n_m*n_w)x1

现在让我将抽奖转换为统计字词:

Let me now translate these draws into statistical terms:

对于t=1,...,n_w*n_m,可以将X(t)视为随机变量X_t

For t=1,...,n_w*n_m, X(t) can be thought as a realisation of a random variable X_t

对于t=1,...,n_w*n_m,可以将Z(t)视为随机变量Z_t

For t=1,...,n_w*n_m, Z(t) can be thought as a realisation of a random variable Z_t

对于t=1,...,n_w*n_m,可以将etapair(t)视为随机变量E_t

For t=1,...,n_w*n_m, etapair(t) can be thought as a realisation of a random variable E_t

对于t=1,...,n_w*n_m,可以将zetapair(t)视为随机变量Q_t

For t=1,...,n_w*n_m, zetapair(t) can be thought as a realisation of a random variable Q_t

对于t=1,...,n_w*n_m,可以将etasingle(t)视为随机变量Y_t

For t=1,...,n_w*n_m, etasingle(t) can be thought as a realisation of a random variable Y_t

对于t=1,...,n_w*n_m,可以将zetasingle(t)视为随机变量S_t

For t=1,...,n_w*n_m, zetasingle(t) can be thought as a realisation of a random variable S_t

我的信念是Matlab中的伪随机数生成器可以声称 (X_1,X_2,..., Z_1,Z_2,...,E_1,E_2,..., Q_1,Q_2...,Y_1,Y_2,...,S_1,S_2,...)是相互独立的 如此处

My belief was that the pseudo random number generator in Matlab allows to claim that (X_1,X_2,..., Z_1,Z_2,...,E_1,E_2,..., Q_1,Q_2...,Y_1,Y_2,...,S_1,S_2,...) are mutually independent as explained here

为验证这一假设,我定义了W_t:=-E_t-Q_t+Y_t+S_t并凭经验计算了Pr(W_t<=1|X_t=5, Z_t=1)

As a check of this hypothetical claim, I define W_t:=-E_t-Q_t+Y_t+S_t and empirically compute Pr(W_t<=1|X_t=5, Z_t=1)

如果具有相互独立性,则Pr(W_t<=1|X_t=5, Z_t=1)=Pr(W_t<=1)及其下面的经验对等物分别命名为option1option2应该几乎相同.

If mutual independence holds, then Pr(W_t<=1|X_t=5, Z_t=1)=Pr(W_t<=1) and their empirical counterparts below, named option1 and option2, should be ALMOST the same.

%option 1
num1=zeros(n_m*n_w,1);
for h=1:n_m*n_w
    if -etapair(h)-zetapair(h)+etasingle(h)+zetasingle(h)<=1 && XZ(h,1)==5 && XZ(h,2)==1
        num1(h)=1;
    end
end
den1=zeros(n_m*n_w,1);
for h=1:n_m*n_w
    if  XZ(h,1)==5 && XZ(h,2)==1
        den1(h)=1;
    end
end
option1=sum(num1)/sum(den1);

%option 2
num2=zeros(n_m*n_w,1);
for h=1:n_m*n_w
    if -etapair(h)-zetapair(h)+etasingle(h)+zetasingle(h)<=1 
        num2(h)=1;
    end
end
option2=sum(num2)/(n_m*n_w);

问题:option1(= 0.0021)和option2(= 0.0012)之间的区别被称为"ALMOST",或者我做错了什么?

Question: the difference between option1 (=0.0021) and option2 (=0.0012) is referred to the "ALMOST" or I am doing something wrong?

推荐答案

通过观察随机事件的本质,您不能保证给出实验性试验的理论上准确的结果.

By the very nature of observing random events, you cannot guarantee theoretically accurate results for a give empirical trial.

您已在脚本开始处设置了rng default,这意味着您将始终获得相同的结果(option1 = 0.0021option2 = 0.0012).

You have set rng default at the start of your script, which means you will always get the same result (option1 = 0.0021, option2 = 0.0012).

多次运行脚本并取平均结果,我们应该达到理论上的准确性.

Running your script many times and averaging the results, we should approach theoretical accuracy.

kk = 10000;
option1 = zeros(kk, 1);
option2 = zeros(kk, 1);
for ii = 1:kk
    % No need to use 'clear' here. If you were concerned 
    % for some reason, you could use 'clearvars -except kk option1 option2 ii'
    % do not use 'rng default'. Use 'rng shuffle' if anything, but not necessary
    Xsup = -1:6;
    % ... all your other code
    % replace 'option1=...' with 'option1(ii)=...'
    % replace 'option2=...' with 'option2(ii)=...'  
end
fprintf('Results:\nMean option1 = %f\nMean option2 = %f\n', mean(option1), mean(option2));

结果:

>> Mean option1 = 0.001461
>> Mean option2 = 0.001458

我们可以看到这些在某种程度上都是相同的,如果我们进行X次试验(对于足够大的X),则可能是任意高的.这是自变量所期望的.

We can see these are the same to some degree of accuracy, which can be arbitrarily high if we run X trials (for large enough X). This is as expected for independent variables.

请注意,如果您具有并行计算工具箱,则可以轻松地将此for循环替换为parfor,并且可以更快地运行试验.

Note, if you have the parallel computing toolbox, this for loop can easily be swapped for a parfor, and you can run trials many times faster.

这篇关于了解Matlab中的伪随机数如何暗示统计独立性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆