从具有可变底层网格的内核密度估计器进行模拟 [英] Simulate from kernel density estimator with variable underlying grid
问题描述
我有一个数据集,用于通过估计核密度来创建经验概率分布.现在我正在使用 R 的
我的数据知道有很多波动的区域,需要精细的网格粒度.其他地区基本上没有数据点,那里什么也没有.如果我可以将 kde2d
的 n
参数设置为一个非常高的数字,以便在任何地方都能很好地解析我的数据,我会很好.唉,由于内存限制,这是不可能的.
这就是为什么我认为我可以修改 kde2d
函数以获得非常量的粒度.
问题是,我不能再使用 sample
沿 x 轴从切片中采样.因为分布左侧的部分更精细,因此被sample
采样的概率更高.
我该怎么做才能在我需要的地方有一个精细的网格,但根据其适当的密度从分布中取样?非常感谢.
在 conditional_probabilty_density
上使用 approx
和新的 n
.
I have a dataset that I'm using to create an empirical probability distribution by estimating a kernel density. Right now I'm using R's kde2d
from the MASS package. After estimating the probability distribution, I use sample
to sample from slices of the 2D distribution along the x-axis. I use sample
much like described here. Example code would look like this
library(MASS)
set.seed(123)
x = rnorm(100, 1, 0.1)
set.seed(456)
y = rnorm(100, 1, 0.5)
den <- kde2d(x, y, n = 50, lims = c(-2, 2, -2, 2))
#to plot this 2d kde:
#library(lattice)
#persp(den)
conditional_probabilty_density = list(x = den$y, y = den$z[40, ])
#to plot the slice:
#plot(conditional_probabilty_density)
simulated_sample = sample(conditional_probabilty_density$x, size = 10, replace = TRUE, prob = conditional_probabilty_density$y)
The den
looks like this
My data has known areas where there is a lot of fluctuations, requiring a fine grid granularity. Other areas have basically no data points and nothing is going on there. I would be fine if I could just set the n
parameter of kde2d
to a very high number in order to have a good resolution of my data everywhere. Alas, this is not possible due to memory constraints.
That's why I thought I could modify the kde2d
function to have a non-constant granularity.
Here is the source code of the kde2d function.
One can modify the line
gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])
and put whatever granularity is wished for on the y-axis. For example
a <- seq(-1, 0, 0.5)
gy <- c(a, seq.int(0.1, 2, length.out = n[2L]-length(a)))
And the modified kde2d
returns the kernel density estimate at the specified positions. Works very well. Suppose I have now
Problem is, I can no longer use sample
to sample from slices along the x-axis. Because the part on the left side of the distribution is much finer and thus has a higher probability to be sampled by sample
.
What can I do to have a fine grid where I need it, but sample from the distribution according to its proper densities? Thank you a lot.
Use approx
on conditional_probabilty_density
with a new n
.
这篇关于从具有可变底层网格的内核密度估计器进行模拟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!