R的nrd0是否有scipy/numpy替代项? [英] Is there a scipy/numpy alternative to R's nrd0?
问题描述
来自 R文档 ...
bw.nrd0实现了一个经验法则,用于选择带宽 高斯核密度估计器.预设值为0.9倍 标准差和四分位间距的最小值除 是样本大小的1.34倍乘以负五分之一(= Silverman的经验法则",Silverman(1986,第48页,eqn(3.31)) 除非四分位数重合,否则将获得正结果 保证.
bw.nrd0 implements a rule-of-thumb for choosing the bandwidth of a Gaussian kernel density estimator. It defaults to 0.9 times the minimum of the standard deviation and the interquartile range divided by 1.34 times the sample size to the negative one-fifth power (= Silverman's ‘rule of thumb’, Silverman (1986, page 48, eqn (3.31)) unless the quartiles coincide when a positive result will be guaranteed.
在1到400(等于np.arange(1,401)
)的数组上,nrd0将返回31.39367.当我尝试在python中实现类似的东西时...
On an array from 1 to 400 (equivalent to np.arange(1,401)
), nrd0 will return 31.39367. When I try to implement something similar in python...
def nrd0_python(x):
X = min(np.std(x), np.percentile(x,25))
top = 0.9*X
bottom = 1.34*len(x)**(-0.2)
return top/bottom
我超过200(准确地说是224.28217762858455)
I get upwards of 200 (to be exact, 224.28217762858455)
Silverman的经验法则是否有已知的python函数?
Are there any known python functions for silverman's rule of thumb?
推荐答案
您对IQR的计算不正确:
You don't have the right calculation for IQR:
import numpy
def bw_nrd0(x):
if len(x) < 2:
raise(Exception("need at least 2 data points"))
hi = numpy.std(x, ddof=1)
q75, q25 = numpy.percentile(x, [75 ,25])
iqr = q75 - q25
lo = min(hi, iqr/1.34)
if not ((lo == hi) or (lo == abs(x[0])) or (lo == 1)):
lo = 1
return 0.9 * lo *len(x)**-0.2
这与R函数的返回结果相同,表示为:
This returns the same as the R function, which is given as:
> bw.nrd0
function (x)
{
if (length(x) < 2L)
stop("need at least 2 data points")
hi <- sd(x)
if (!(lo <- min(hi, IQR(x)/1.34)))
(lo <- hi) || (lo <- abs(x[1L])) || (lo <- 1)
0.9 * lo * length(x)^(-0.2)
}
<bytecode: 0x0000000010c688b0>
<environment: namespace:stats>
My original reading of the second if
statement in the R
code was incorrect, see this question. It is meant to capture cases where lo == 0
(because sd(x) == 0
), and in that case, where the vector is all zeros. If my understanding is correct, it should read:
if not lo:
if hi:
lo = hi
elif abs(x[0]):
lo = abs(x[0])
else:
lo = 1
类似于R
代码,您可以将其缩短为:
Like the R
code, you can shorten this to :
lo = lo or hi or abs(x[0]) or 1
包括这些检查以及对向量长度的检查很重要.
It is important to include these checks, along with the check for the length of the vector.
这篇关于R的nrd0是否有scipy/numpy替代项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!