如何根据拒绝标准从分布中生成目标样本数 [英] How to generate target number of samples from a distribution under a rejection criterion

查看:92
本文介绍了如何根据拒绝标准从分布中生成目标样本数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试 rnbinom

x<- rnbinom(500, mu = 4, size = .1)
xtrunc <- x[x>0]

然后我只得到125个观测值。

then I just get 125 observations.

但是,我要进行500个观测值,但排除0(零)且条件相同( mu = 4,size = .1 )。

However, I want to make 500 observations excluding 0 (zero) with same condition (mu = 4, size =.1).

推荐答案

这可以完成工作:

N <- 500    ## target number of samples

## set seed for reproducibility
set.seed(0)
## first try
x <- rnbinom(N, mu = 4, size = .1)
p_accept <- mean(success <- x > 0)  ## estimated probability of accepting samples
xtrunc <- x[success]
## estimated further sampling times
n_further <- (N - length(xtrunc)) / p_accept
## further sampling
alpha <- 1.5   ## inflation factor
x_further <- rnbinom(round(alpha * n_further), mu = 4, size = .1)
## filter and combine
xtrunc <- c(xtrunc, (x_further[x_further > 0])[seq_len(N - length(xtrunc))])

## checking
length(xtrunc)
# [1] 500

summary(xtrunc)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   1.00    2.00    5.00   12.99   16.00  131.00 






在上面,采样分为两个阶段。初始阶段的结果用于估计接受率的概率,以指导第二阶段的采样。


In above, sampling takes two stages. The result of the initial stage is used to estimate probability of acceptance rate to guide the second stage sampling.

但是,由于基本分布是已知的,因此理论上的概率接受率是已知的。因此,在这种情况下,无需执行两阶段方法。尝试:

However, since the underlying distribution is explicitly known, the theoretical probability of acceptance rate is know. Therefore, there is no need to perform a two-stage approach in this case. Try:

p <- 1 - pnbinom(0, mu = 4, size = .1)  ## theoretical probability
alpha <- 1.5
n_try <- round(alpha * N / p)
set.seed(0)
x <- rnbinom(n_try, mu = 4, size = .1)
xtrunc <- (x[x > 0])[1:N]

## checking
length(xtrunc)
# [1] 500

summary(xtrunc)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   1.00    2.00    5.00   12.99   16.00  131.00






背后的想法是几何分布理论。 我的回答与此密切相关。阅读更有效的矢量化方法部分以获取详细说明。


The idea behind is the theory of geometric distribution. My answer here is closely related. Read the "More efficient vectorized method" section for detailed explanation.

这篇关于如何根据拒绝标准从分布中生成目标样本数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆