从密度对象(或更广泛地从一组数字)生成随机数 [英] Generate a random number from a density object (or more broadly from a set of numbers)
问题描述
比方说,我怀疑有一组数字来自同一分布.
Let's say I have a set of numbers that I suspect come from the same distribution.
set.seed(20130613)
x <- rcauchy(10)
我想要一个函数,该函数从相同的未知分布中随机生成一个数字.我想到的一种方法是创建一个density
对象,然后从中获取CDF并采用随机均匀变量(请参阅维基百科).
I would like a function that randomly generates a number from that same unknown distribution. One approach I have thought of is to create a density
object and then get the CDF from that and take the inverse CDF of a random uniform variable (see Wikipedia).
den <- density(x)
#' Generate n random numbers from density() object
#'
#' @param n The total random numbers to generate
#' @param den The density object from which to generate random numbers
rden <- function(n, den)
{
diffs <- diff(den$x)
# Making sure we have equal increments
stopifnot(all(abs(diff(den$x) - mean(diff(den$x))) < 1e-9))
total <- sum(den$y)
den$y <- den$y / total
ydistr <- cumsum(den$y)
yunif <- runif(n)
indices <- sapply(yunif, function(y) min(which(ydistr > y)))
x <- den$x[indices]
return(x)
}
rden(1, den)
## [1] -0.1854121
我的问题如下:
- 是否有更好的(或内置于R中)从密度对象生成随机数的方法?
- 关于如何从一组数字中生成随机数(除
sample
之外)还有其他想法吗?
- Is there a better (or built into R) way to generate a random number from a density object?
- Are there any other ideas on how to generate a random number from a set of numbers (besides
sample
)?
推荐答案
要从密度估算值生成数据,您只需随机选择一个原始数据点,然后根据密度估算值的内核添加一个随机的错误"片段,对于默认的高斯",这意味着从原始向量中选择一个随机元素,并添加均值0和sd等于所使用带宽的随机法线:
To generate data from a density estimate you just randomly choose one of the original data points and add a random "error" piece based on the kernel from the density estimate, for the default of "Gaussian" this just means choose a random element from the original vector and add a random normal with mean 0 and sd equal to the bandwidth used:
den <- density(x)
N <- 1000
newx <- sample(x, N, replace=TRUE) + rnorm(N, 0, den$bw)
另一种选择是使用logspline
包中的logspline
函数拟合密度(使用另一种估算密度的方法),然后在该包中使用rlogspline
函数从估算的值中生成新数据密度.
Another option is to fit a density using the logspline
function from the logspline
package (uses a different method of estimating a density), then use the rlogspline
function in that package to generate new data from the estimated density.
这篇关于从密度对象(或更广泛地从一组数字)生成随机数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!