如何跳过一个步骤并增加R中for循环的迭代次数 [英] How to skip a step and increase the number of iterations in a for loop in R

查看:626
本文介绍了如何跳过一个步骤并增加R中for循环的迭代次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在 R 中有一个很大的 for循环来模拟不同的数据,一个数量在循环内部0的方式,这是不可取的,我们应该跳过这一步的数据生成。但是同时我们也需要增加迭代次数,因为这样的跳过了,否则我们的观察次数会少于所需的。例如,在运行以下代码时,我们在迭代1,8和9中得到 z = 0



$ p $ lt; code $ rm(list = ls())
n< - 10
z< - NULL
for(i in 1:n){
set.seed(i)
a < - rbinom(1,1,0.5)
b < - rbinom(1,1,0.5)
z [ i] < - a + b
}
z
[1] 0 1 1 1 1 2 1 0 0 1

我们希望跳过这些步骤,以便我们没有任何 z = 0 ,但是我们也想要一个向量 z 长度为10.可以通过多种方式完成。但是我特别想看到的是,当遇到 z = 0 时,如何停止迭代并跳过当前步骤,并进入下一步,最终获得10个观测值, STRONG>。

解决方案

通常我们通过来完成这个工作,而

  n < -  10L 
z< - integer(n)
m <-1L; (m <= n){
set.seed(i)
z_i < - sum(rbinom(2L,1,0.5))
(z_i> 0L){z [m] < - z_i; m + 1L}
i < - i + 1L
>



<输出:

$ $ $ $ $ c $ z
#1 1 1 1 1 2 1 1 1 1

i
#[1] 14

其中有0个,其余10个被保留。




更高效的矢量化方法 p>

  set.seed(0)
n < - 10L
z < - rbinom(n,1,0.5 )+ rbinom(n,1,0.5)
m < - 长度(z <-z [z> 0L])##已滤波样本
p < - m / n ##估计成功概率(n-m)个非零样本
z_more < - rbinom(k,1) (z_more> 0)[seq_len(n_m)]])

这里使用了几何分布的一些概率论。最初,我们抽样 n 样本, m 保留。因此,接受样本的估计成功概率是 p < - m / n 。根据几何分布理论,平均而言,我们至少需要 1 / p 样本来观察成功。因此,我们至少应该多取样(n-m)/ p 来期望(n-m)成功。 1.5 只是一个膨胀因子。通过抽样1.5倍的样本,我们希望能够确保(nm)成功。根据大数定律,当 n 很大时, p 的估计值更为精确。因此,这个方法对于大的 n 是稳定的。如果你觉得1.5不够大,使用2或3.但我的感觉是,这是足够的。


We have a big for loop in R for simulating various data where for some iterations the data generate in such a way that a quantity comes 0 inside the loop, which is not desirable and we should skip that step of data generation. But at the same time we also need to increase the number of iterations by one step because of such skip, otherwise we will have fewer observations than required.

For example, while running the following code, we get z=0 in iteration 1, 8 and 9.

rm(list=ls())
n <- 10
z <- NULL
for(i in 1:n){
  set.seed(i)
  a <- rbinom(1,1,0.5)
  b <- rbinom(1,1,0.5)
  z[i] <- a+b
}
z
[1] 0 1 1 1 1 2 1 0 0 1

We desire to skip these steps so that we do not have any z=0 but we also want a vector z of length 10. It may be done in many ways. But what I particularly want to see is how we can stop the iteration and skip the current step when z=0 is encountered and go to the next step, ultimately obtaining 10 observations for z.

解决方案

Normally we do this via a while loop, as the number of iterations required is unknown beforehand.

n <- 10L
z <- integer(n)
m <- 1L; i <- 0L
while (m <= n) {
  set.seed(i)
  z_i <- sum(rbinom(2L, 1, 0.5))
  if (z_i > 0L)  {z[m] <- z_i; m <- m + 1L}
  i <- i + 1L
  }

Output:

z
# [1] 1 1 1 1 1 2 1 1 1 1

i
# [1] 14

So we sample 14 times, 4 of which are 0 and the rest 10 are retained.


More efficient vectorized method

set.seed(0)
n <- 10L
z <- rbinom(n, 1, 0.5) + rbinom(n, 1, 0.5)
m <- length(z <- z[z > 0L])  ## filtered samples
p <- m / n  ## estimated success probability
k <- round(1.5 * (n - m) / p)   ## further number of samples to ensure successful (n - m) non-zero samples
z_more <- rbinom(k, 1, 0.5) + rbinom(k, 1, 0.5)
z <- c(z, z_more[which(z_more > 0)[seq_len(n - m)]])

Some probability theory of geometric distribution has been used here. Initially we sample n samples, m of which are retained. So the estimated probability of success in accepting samples is p <- m/n. According to theory of Geometric distribution, on average, we need at least 1/p samples to observe a success. Therefore, we should at least sample (n-m)/p more times to expect (n-m) success. The 1.5 is just an inflation factor. By sampling 1.5 times more samples we hopefully can ensure (n-m) success.

According to Law of large numbers, the estimate of p is more precise when n is large. Therefore, this approach is stable for large n.

If you feel that 1.5 is not large enough, use 2 or 3. But my feeling is that it is sufficient.

这篇关于如何跳过一个步骤并增加R中for循环的迭代次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆