使用scipy高斯核密度估计来计算CDF逆 [英] Using scipy gaussian kernel density estimation to calculate CDF inverse

查看:563
本文介绍了使用scipy高斯核密度估计来计算CDF逆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

scipy.stats中的gaussian_kde函数具有函数evaluate,该函数可以返回输入点的PDF值.我正在尝试使用gaussian_kde估计逆CDF.目的是生成一些输入数据的蒙特卡洛实现,这些输入数据的统计分布使用KDE进行数字估计.是否有绑定到gaussian_kde的方法可以实现此目的?

The gaussian_kde function in scipy.stats has a function evaluate that can returns the value of the PDF of an input point. I'm trying to use gaussian_kde to estimate the inverse CDF. The motivation is for generating Monte Carlo realizations of some input data whose statistical distribution is numerically estimated using KDE. Is there a method bound to gaussian_kde that serves this purpose?

以下示例显示了在高斯分布情况下该如何工作.首先,我将展示如何进行PDF计算以设置我要实现的特定API:

The example below shows how this should work for the case of a Gaussian distribution. First I show how to do the PDF calculation to set up the specific API I'm trying to achieve:

import numpy as np 
from scipy.stats import norm, gaussian_kde

npts_kde = int(5e3)
n = np.random.normal(loc=0, scale=1, size=npts_kde)
kde = gaussian_kde(n)

npts_sample = int(1e3)
x = np.linspace(-3, 3, npts_sample)
kde_pdf = kde.evaluate(x)
norm_pdf = norm.pdf(x)

是否存在类似的简单方法来计算逆CDF? norm函数有一个非常方便的isf函数,可以完全做到这一点:

Is there an analogously simple way to compute the inverse CDF? The norm function has a very handy isf function that does exactly this:

cdf_value = np.sort(np.random.rand(npts_sample))
cdf_inv = norm.isf(1 - cdf_value)

kde_gaussian是否存在这样的功能?还是从已经实现的方法中构造这样的功能很简单?

Does such a function exist for kde_gaussian? Or is it straightforward to construct such a function from the already implemented methods?

推荐答案

方法源代码(实际上只是对 special.ndtr )可以加快速度.

The method integrate_box_1d can be used to compute the CDF, but it is not vectorized; you'll need to loop over points. If memory is not an issue, rewriting its source code (which is essentially just a call to special.ndtr) in vector form may speed things up.

from scipy.special import ndtr
stdev = np.sqrt(kde.covariance)[0, 0]
pde_cdf = ndtr(np.subtract.outer(x, n)).mean(axis=1)
plot(x, pde_cdf)

反函数的图应为plot(pde_cdf, x).如果目标是在特定点计算反函数,请考虑使用插值样条线的反函数,对计算的值进行插值CDF.

The plot of the inverse function would be plot(pde_cdf, x). If the goal is to compute the inverse function at a specific point, consider using the inverse of interpolating spline, interpolating the computed values of the CDF.

这篇关于使用scipy高斯核密度估计来计算CDF逆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆