了解Matlab中的伪随机数如何暗示统计独立性 [英] Understanding how pseudo random numbers in Matlab imply statistical independence
问题描述
考虑以下Matlab代码,其中我使用伪随机数生成器生成了一些数据. 我希望您能从统计学的角度(以我在下面解释的术语)帮助您理解这些数字的随机性".
Consider the following Matlab code in which I generate some data using pseudo random number generator. I would like your help to understand "how" random are these numbers from a statistical point of view, in the terms I explain below.
我首先设置一些参数
%%%%%%%%Parameters
clear
rng default
Xsup=-1:6;
Zsup=1:10;
n_m=200;
n_w=200;
R=n_m;
然后我生成数据
%%%%%%%%Creation of data [XZ,etapair,zetapair,etasingle,zetasingle]
%Vector X of dimension n_mx1
idX=randi(size(Xsup,2),n_m,1); %n_mx1
X=Xsup(idX).'; %n_mx1
%Vector Z of dimension n_wx1
idZ=randi(size(Zsup,2),n_w,1);
Z=Zsup(idZ).'; %n_wx1
%Combine X and Z in a matrix XZ of dimension (n_m*n_w)x2
which lists all possible combinations of values in X and Z
[cX, cZ] = ndgrid(X,Z);
XZ = [cX(:), cZ(:)]; %(n_m*n_w)x2
%Vector etapair of dimension (n_m*n_w)x1
etapair=randn(n_m*n_w,1); %(n_m*n_w)x1
%Vector zetapair of dimension (n_m*n_w)x1
zetapair=randn(n_m*n_w,1); %(n_m*n_w)x1
%Vector etasingle of dimension (n_m*n_w)x1
etasingle=max(randn(n_m,R),[],2); %n_mx1
etasingle=repmat(etasingle, n_w,1); %(n_m*n_w)x1
%Vector zetasingle of dimension (n_m*n_w)x1
zetasingle=max(randn(n_w,R),[],2); %n_wx1
zetasingle=kron(zetasingle, ones(n_m,1)); %(n_m*n_w)x1
现在让我将抽奖转换为统计字词:
Let me now translate these draws into statistical terms:
对于t=1,...,n_w*n_m
,可以将X(t)
视为随机变量X_t
For t=1,...,n_w*n_m
, X(t)
can be thought as a realisation of a random variable X_t
对于t=1,...,n_w*n_m
,可以将Z(t)
视为随机变量Z_t
For t=1,...,n_w*n_m
, Z(t)
can be thought as a realisation of a random variable Z_t
对于t=1,...,n_w*n_m
,可以将etapair(t)
视为随机变量E_t
For t=1,...,n_w*n_m
, etapair(t)
can be thought as a realisation of a random variable E_t
对于t=1,...,n_w*n_m
,可以将zetapair(t)
视为随机变量Q_t
For t=1,...,n_w*n_m
, zetapair(t)
can be thought as a realisation of a random variable Q_t
对于t=1,...,n_w*n_m
,可以将etasingle(t)
视为随机变量Y_t
For t=1,...,n_w*n_m
, etasingle(t)
can be thought as a realisation of a random variable Y_t
对于t=1,...,n_w*n_m
,可以将zetasingle(t)
视为随机变量S_t
For t=1,...,n_w*n_m
, zetasingle(t)
can be thought as a realisation of a random variable S_t
我的信念是Matlab中的伪随机数生成器可以声称
(X_1,X_2,..., Z_1,Z_2,...,E_1,E_2,..., Q_1,Q_2...,Y_1,Y_2,...,S_1,S_2,...)
是相互独立的
如此处
My belief was that the pseudo random number generator in Matlab allows to claim that
(X_1,X_2,..., Z_1,Z_2,...,E_1,E_2,..., Q_1,Q_2...,Y_1,Y_2,...,S_1,S_2,...)
are mutually independent
as explained here
为验证这一假设,我定义了W_t:=-E_t-Q_t+Y_t+S_t
并凭经验计算了Pr(W_t<=1|X_t=5, Z_t=1)
As a check of this hypothetical claim, I define W_t:=-E_t-Q_t+Y_t+S_t
and empirically compute Pr(W_t<=1|X_t=5, Z_t=1)
如果具有相互独立性,则Pr(W_t<=1|X_t=5, Z_t=1)=Pr(W_t<=1)
及其下面的经验对等物分别命名为option1
和option2
应该几乎相同.
If mutual independence holds, then Pr(W_t<=1|X_t=5, Z_t=1)=Pr(W_t<=1)
and their empirical counterparts below, named option1
and option2
, should be ALMOST the same.
%option 1
num1=zeros(n_m*n_w,1);
for h=1:n_m*n_w
if -etapair(h)-zetapair(h)+etasingle(h)+zetasingle(h)<=1 && XZ(h,1)==5 && XZ(h,2)==1
num1(h)=1;
end
end
den1=zeros(n_m*n_w,1);
for h=1:n_m*n_w
if XZ(h,1)==5 && XZ(h,2)==1
den1(h)=1;
end
end
option1=sum(num1)/sum(den1);
%option 2
num2=zeros(n_m*n_w,1);
for h=1:n_m*n_w
if -etapair(h)-zetapair(h)+etasingle(h)+zetasingle(h)<=1
num2(h)=1;
end
end
option2=sum(num2)/(n_m*n_w);
问题:option1
(= 0.0021)和option2
(= 0.0012)之间的区别被称为"ALMOST",或者我做错了什么?
Question: the difference between option1
(=0.0021) and option2
(=0.0012) is referred to the "ALMOST" or I am doing something wrong?
推荐答案
通过观察随机事件的本质,您不能保证给出实验性试验的理论上准确的结果.
By the very nature of observing random events, you cannot guarantee theoretically accurate results for a give empirical trial.
您已在脚本开始处设置了rng default
,这意味着您将始终获得相同的结果(option1 = 0.0021
,option2 = 0.0012
).
You have set rng default
at the start of your script, which means you will always get the same result (option1 = 0.0021
, option2 = 0.0012
).
多次运行脚本并取平均结果,我们应该达到理论上的准确性.
Running your script many times and averaging the results, we should approach theoretical accuracy.
kk = 10000;
option1 = zeros(kk, 1);
option2 = zeros(kk, 1);
for ii = 1:kk
% No need to use 'clear' here. If you were concerned
% for some reason, you could use 'clearvars -except kk option1 option2 ii'
% do not use 'rng default'. Use 'rng shuffle' if anything, but not necessary
Xsup = -1:6;
% ... all your other code
% replace 'option1=...' with 'option1(ii)=...'
% replace 'option2=...' with 'option2(ii)=...'
end
fprintf('Results:\nMean option1 = %f\nMean option2 = %f\n', mean(option1), mean(option2));
结果:
>> Mean option1 = 0.001461
>> Mean option2 = 0.001458
我们可以看到这些在某种程度上都是相同的,如果我们进行X次试验(对于足够大的X),则可能是任意高的.这是自变量所期望的.
We can see these are the same to some degree of accuracy, which can be arbitrarily high if we run X trials (for large enough X). This is as expected for independent variables.
请注意,如果您具有并行计算工具箱,则可以轻松地将此for
循环替换为parfor
,并且可以更快地运行试验.
Note, if you have the parallel computing toolbox, this for
loop can easily be swapped for a parfor
, and you can run trials many times faster.
这篇关于了解Matlab中的伪随机数如何暗示统计独立性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!