从具有可变底层网格的内核密度估计器进行模拟 [英] Simulate from kernel density estimator with variable underlying grid

查看:64
本文介绍了从具有可变底层网格的内核密度估计器进行模拟的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,用于通过估计核密度来创建经验概率分布.现在我正在使用 R 的

我的数据知道有很多波动的区域,需要精细的网格粒度.其他地区基本上没有数据点,那里什么也没有.如果我可以将 kde2dn 参数设置为一个非常高的数字,以便在任何地方都能很好地解析我的数据,我会很好.唉,由于内存限制,这是不可能的.

这就是为什么我认为我可以修改 kde2d 函数以获得非常量的粒度.

问题是,我不能再使用 sample 沿 x 轴从切片中采样.因为分布左侧的部分更精细,因此被sample采样的概率更高.

我该怎么做才能在我需要的地方有一个精细的网格,但根据其适当的密度从分布中取样?非常感谢.

解决方案

conditional_probabilty_density 上使用 approx 和新的 n.

I have a dataset that I'm using to create an empirical probability distribution by estimating a kernel density. Right now I'm using R's kde2d from the MASS package. After estimating the probability distribution, I use sample to sample from slices of the 2D distribution along the x-axis. I use sample much like described here. Example code would look like this

library(MASS)
set.seed(123)
x = rnorm(100, 1, 0.1)
set.seed(456)
y = rnorm(100, 1, 0.5)
den <- kde2d(x, y, n = 50, lims = c(-2, 2, -2, 2))
#to plot this 2d kde:
#library(lattice)
#persp(den)
conditional_probabilty_density = list(x = den$y, y = den$z[40, ])
#to plot the slice:
#plot(conditional_probabilty_density)
simulated_sample = sample(conditional_probabilty_density$x, size = 10, replace = TRUE, prob = conditional_probabilty_density$y)

The den looks like this

My data has known areas where there is a lot of fluctuations, requiring a fine grid granularity. Other areas have basically no data points and nothing is going on there. I would be fine if I could just set the n parameter of kde2d to a very high number in order to have a good resolution of my data everywhere. Alas, this is not possible due to memory constraints.

That's why I thought I could modify the kde2d function to have a non-constant granularity.
Here is the source code of the kde2d function. One can modify the line

gy <- seq.int(lims[3L], lims[4L], length.out = n[2L])

and put whatever granularity is wished for on the y-axis. For example

a <- seq(-1, 0, 0.5)
gy <- c(a, seq.int(0.1, 2, length.out = n[2L]-length(a)))

And the modified kde2d returns the kernel density estimate at the specified positions. Works very well. Suppose I have now

Problem is, I can no longer use sample to sample from slices along the x-axis. Because the part on the left side of the distribution is much finer and thus has a higher probability to be sampled by sample.

What can I do to have a fine grid where I need it, but sample from the distribution according to its proper densities? Thank you a lot.

解决方案

Use approx on conditional_probabilty_density with a new n.

这篇关于从具有可变底层网格的内核密度估计器进行模拟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆