R的nrd0是否有scipy/numpy替代项? [英] Is there a scipy/numpy alternative to R's nrd0?

查看:104
本文介绍了R的nrd0是否有scipy/numpy替代项?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自 R文档 ...

bw.nrd0实现了一个经验法则,用于选择带宽 高斯核密度估计器.预设值为0.9倍 标准差和四分位间距的最小值除 是样本大小的1.34倍乘以负五分之一(= Silverman的经验法则",Silverman(1986,第48页,eqn(3.31)) 除非四分位数重合,否则将获得正结果 保证.

bw.nrd0 implements a rule-of-thumb for choosing the bandwidth of a Gaussian kernel density estimator. It defaults to 0.9 times the minimum of the standard deviation and the interquartile range divided by 1.34 times the sample size to the negative one-fifth power (= Silverman's ‘rule of thumb’, Silverman (1986, page 48, eqn (3.31)) unless the quartiles coincide when a positive result will be guaranteed.

在1到400(等于np.arange(1,401))的数组上,nrd0将返回31.39367.当我尝试在python中实现类似的东西时...

On an array from 1 to 400 (equivalent to np.arange(1,401)), nrd0 will return 31.39367. When I try to implement something similar in python...

def nrd0_python(x):

    X = min(np.std(x), np.percentile(x,25))

    top = 0.9*X
    bottom = 1.34*len(x)**(-0.2)

    return top/bottom

我超过200(准确地说是224.28217762858455)

I get upwards of 200 (to be exact, 224.28217762858455)

Silverman的经验法则是否有已知的python函数?

Are there any known python functions for silverman's rule of thumb?

推荐答案

您对IQR的计算不正确:

You don't have the right calculation for IQR:

import numpy

def bw_nrd0(x):

    if len(x) < 2:
        raise(Exception("need at least 2 data points"))

    hi = numpy.std(x, ddof=1)
    q75, q25 = numpy.percentile(x, [75 ,25])
    iqr = q75 - q25
    lo = min(hi, iqr/1.34)

    if not ((lo == hi) or (lo == abs(x[0])) or (lo == 1)):
        lo = 1

    return 0.9 * lo *len(x)**-0.2

这与R函数的返回结果相同,表示为:

This returns the same as the R function, which is given as:

> bw.nrd0
function (x) 
{
    if (length(x) < 2L) 
        stop("need at least 2 data points")
    hi <- sd(x)
    if (!(lo <- min(hi, IQR(x)/1.34))) 
        (lo <- hi) || (lo <- abs(x[1L])) || (lo <- 1)
    0.9 * lo * length(x)^(-0.2)
}
<bytecode: 0x0000000010c688b0>
<environment: namespace:stats>

我对R代码中第二个if语句的原始阅读不正确,请参见

My original reading of the second if statement in the R code was incorrect, see this question. It is meant to capture cases where lo == 0 (because sd(x) == 0), and in that case, where the vector is all zeros. If my understanding is correct, it should read:

if not lo:
    if hi:
        lo = hi
    elif abs(x[0]):
        lo = abs(x[0])
    else:
        lo = 1

类似于R代码,您可以将其缩短为:

Like the R code, you can shorten this to :

lo = lo or hi or abs(x[0]) or 1

包括这些检查以及对向量长度的检查很重要.

It is important to include these checks, along with the check for the length of the vector.

这篇关于R的nrd0是否有scipy/numpy替代项?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆