在 R 的采样中复制 n 次和直接生成 n 之间有什么区别? [英] What is difference between replicate n times and generate n directly in sampling of R?

查看:28
本文介绍了在 R 的采样中复制 n 次和直接生成 n 之间有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我被要求将 x 模拟为一个独立的同分布 (iid) 正态变量,均值=0,std=1.5,样本长度为 500"

I am asked to "simulate x as an independent identically distributed (iid) normal variable with mean=0, std=1.5 with sample length 500"

我通过以下两种方式进行采样:

I am doing the sampling in following two ways:

set.seed(8402)
X <- rnorm(500, 0, 1.5)
head(X)

我得到了

-1.8297969 -0.1862884 1.4219400 -1.0841421 -1.5276701 1.6159368

但是,如果我这样做

X <- replicate(500, rnorm(1,0,1.5))
head(X)

我得到了

-0.04032755 0.92002552 -2.28001943 -1.36840869 1.49820718 0.06205003

我的问题是生成 iid 普通变量的正确方法是什么?这两种方式有什么区别?

My question is what is the right way to generate iid normal variable? What is the difference between those two ways?

非常感谢!

推荐答案

R 内部

在 R 内部,来自 : double rnorm (double mean, double sd) 函数的 C 函数一次生成一个随机数.当你调用它的 R 包装函数 rnorm(n, mean, sd) 时,它会调用 C 级函数 n 次.

Internally in R, the C function from <Rmath.h>: double rnorm (double mean, double sd) function generates one random number at a time. When you call its R wrapper function rnorm(n, mean, sd), it calls the C level function n times.

这与您仅使用 n = 1 调用一次 R 级函数相同,但是使用 replicate 将其复制 n 次.

This is as same as you call R level function only once with n = 1, but replicate it n times using replicate.

第一种方法要快得多(当 n 非常大时,可能会看到差异),因为一切都是在 C 级别完成的.然而,replicatesapply 的包装器,因此它并不是真正的矢量化函数(阅读 *apply"系列真的没有向量化吗?).

The first method is much faster (possibly the difference will be seen when n is really large), as everything is done at C level. replicate however, is a wrapper of sapply, so it is not really a vectorized function (read on Is the "*apply" family really not vectorized?).

此外,如果您为两者设置相同的随机种子,您将获得相同的随机数集.

In addition, if you set the same random seed for both, you are going to get the same set of random numbers.

更具说明性的实验

在我下面的评论中,我说随机种子只在进入时设置一次.为了帮助人们理解这一点,我提供了这个例子.没有必要使用大的n.n = 4 就足够了.

In my comment below, I say that random seed is only set once on entry. To help people understand this, I provide this example. There is no need to use large n. n = 4 is sufficient.

首先,让我们将种子设为 0,同时生成 4 个标准正态样本:

First, let's set seed at 0, while generating 4 standard normal samples:

set.seed(0); rnorm(4, 0, 1)
## we get
[1]  1.2629543 -0.3262334  1.3297993  1.2724293

请注意,在这种情况下,所有 4 个数字都是从条目种子 0 中获得的.

Note that in this case, all 4 numbers are obtained from the entry seed 0.

现在,让我们这样做:

set.seed(0)
rnorm(2, 0, 1)
## we get
[1]  1.2629543 -0.3262334
## do not reset seed, but continue with the previous seed
replicate(2, rnorm(1, 0, 1))
## we get
[1] 1.329799 1.272429

看到了吗?

但是如果我们在中间重置种子,例如,将其设置回0

But if we reset seed in the middle, for example, set it back to 0

set.seed(0)
rnorm(2, 0, 1)
## we get
[1]  1.2629543 -0.3262334
## reset seed
set.seed(0)
replicate(2, rnorm(1, 0, 1))
## we get
[1] 1.2629543 -0.3262334

这就是我所说的进入".

This is what I mean by "entry".

这篇关于在 R 的采样中复制 n 次和直接生成 n 之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆