获取SciPy的gaussian_kde函数使用的带宽 [英] Getting bandwidth used by SciPy's gaussian_kde function

查看:198
本文介绍了获取SciPy的gaussian_kde函数使用的带宽的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用SciPy的

您可以通过这种方式进行检查,如果您使用新功能(例如使用sklearn)创建kde:

从sklearn.neighbors

 导入KernelDensitydef kde_sklearn(x,x_grid,带宽):kde_skl = KernelDensity(带宽=带宽)kde_skl.fit(x [:, np.newaxis])#score_samples()返回样本的对数似然log_pdf = kde_skl.score_samples(x_grid [:, np.newaxis])pdf = np.exp(log_pdf)返回pdf 

现在使用上面的相同代码,您将得到:

  plot(x_grid,kde_sklearn(sample,x_grid,f)) 

  plot(x_grid,kde_sklearn(sample,x_grid,bw)) 

I'm using SciPy's stats.gaussian_kde function to generate a kernel density estimate (kde) function from a data set of x,y points.

This is a simple MWE of my code:

import numpy as np
from scipy import stats

def random_data(N):
    # Generate some random data.
    return np.random.uniform(0., 10., N)

# Data lists.
x_data = random_data(100)
y_data = random_data(100)

# Obtain the gaussian kernel.
kernel = stats.gaussian_kde(np.vstack([x_data, y_data]))

Since I'm not setting a bandwidth manually (via the bw_method key), the function defaults to using Scott's rule (see function's description). What I need is to obtain this bandwidth value set automatically by the stats.gaussian_kde function.

I've tried using:

print kernel.set_bandwidth()

but it always returns None instead of a float.

解决方案

Short answer

The bandwidth is kernel.covariance_factor() multiplied by the std of the sample that you are using.

(This is in the case of 1D sample and it is computed using Scott's rule of thumb in the default case).

Example:

from scipy.stats import gaussian_kde
sample = np.random.normal(0., 2., 100)
kde = gaussian_kde(sample)
f = kde.covariance_factor()
bw = f * sample.std()

The pdf that you get is this:

from pylab import plot
x_grid = np.linspace(-6, 6, 200)
plot(x_grid, kde.evaluate(x_grid))

You can check it this way, If you use a new function to create a kde using, say, sklearn:

from sklearn.neighbors import KernelDensity
def kde_sklearn(x, x_grid, bandwidth):
    kde_skl = KernelDensity(bandwidth=bandwidth)
    kde_skl.fit(x[:, np.newaxis])
    # score_samples() returns the log-likelihood of the samples
    log_pdf = kde_skl.score_samples(x_grid[:, np.newaxis])
    pdf = np.exp(log_pdf)
    return pdf

Now using the same code from above you get:

plot(x_grid, kde_sklearn(sample, x_grid, f))

plot(x_grid, kde_sklearn(sample, x_grid, bw))

这篇关于获取SciPy的gaussian_kde函数使用的带宽的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆