Python scipy rv_continuous 实现的问题 [英] Issues with Python scipy rv_continuous implementation
问题描述
我正在尝试使用自定义分布创建 rv_continuous 的子类,我可以通过多个函数为其计算 pdf.
I'm trying to create a subclass of rv_continuous with a custom distribution for which I can calculate the pdf through a number of functions.
这是我到目前为止所做的
Here's what I've done so far
import numpy as np
from scipy.stats import rv_continuous
辅助功能
def func1(xx, a_, b_, rho, m, sigma):
return a_ + b_*(rho*(xx-m) + np.sqrt((xx-m)*(xx-m) + sigma*sigma))
def func2(xx, a_, b_, rho, m, sigma):
sig2 = sigma*sigma
return b_*(rho*np.sqrt((xx-m)*(xx-m)+sig2)+xx-m)/(np.sqrt((xx-m)*(xx-m)+sig2))
def func3(xx, a_, b_, rho, m, sigma):
sig2 = sigma*sigma
return b_*sig2/(np.sqrt((xx-m)*(xx-m)+sig2)*((xx-m)*(xx-m)+sig2))
def func4(xx, a_, b_, rho, m, sigma):
w = func1(xx, a_, b_, rho, m, sigma)
w1 = func2(xx, a_, b_, rho, m, sigma)
w2 = func3(xx, a_, b_, rho, m, sigma)
return (1.-0.5*xx*w1/w)*(1.0-0.5*xx*w1/w) - 0.25*w1*w1*(0.25 + 1./w) + 0.5*w2
def func5(xx, a_, b_, rho, m, sigma):
vsqrt = np.sqrt(func1(xx, a_, b_, rho, m, sigma))
return -xx/vsqrt - 0.5*vsqrt
密度函数最终
def density(xx, a_, b_, rho, m, sigma):
dm = func5(xx, a_, b_, rho, m, sigma)
return func4(xx, a_, b_, rho, m, sigma)*np.exp(-0.5*dm*dm)/np.sqrt(2.*np.pi*func1(xx, a_, b_, rho, m, sigma))
一组参数
Params = 1.0073, 0.3401026, -0.8, 0.000830, 0.5109564
从函数检查pdf
xmin, xmax, nbPoints = -10., 10., 2000
x_real = np.linspace(xmin, xmax, nbPoints)
den_from_func = density(x_real, *Params)
现在构建我的分发类
class density_gen(rv_continuous):
def _pdf(self, x, a_hat, b_hat, rho, m, sigma):
return density(x, a_hat, b_hat, rho, m, sigma)
实例化
my_density = density_gen(name='density_gen')
my_density.a, my_density.b, my_density.numargs
正如我指定的 _pdf 我应该有一个工作分发实例
As I've specified _pdf I should have a working distribution instance
这有效
pdf = my_density._pdf(x_real, *Params)
cdf 也能正常工作,但速度非常慢
cdf works too albeit it's extremely slow
cdf = my_density._cdf(x_real, *Params)
my_density._cdf(0.1, *Params)
但是对于所有其他方法,我得到了 nans,例如
but for all the other methods I get nans, for instance
my_density.mean(*Params)
my_density.ppf(0.01, *Params)
我在这里做错了什么?
推荐答案
看来您需要添加 _argcheck
方法到 density_gen
,因为您的发行版使用自定义参数:>
It appears you need to add the _argcheck
method to density_gen
, since your distribution uses custom parameters:
class density_gen(rv_continuous):
def _argcheck(self, *Params):
return True
def _pdf(self, x, a_hat, b_hat, rho, m, sigma):
return density(x, a_hat, b_hat, rho, m, sigma)
my_density = density_gen(name='density_gen')
pdf = my_density._pdf(x_real, *Params)
print(my_density.rvs(size=5, *Params))
print(my_density.mean(*Params))
print(my_density.ppf(0.01, *Params))
但是,rvs
、mean
等之后会很慢,大概是因为该方法每次需要生成随机数时都需要整合PDF或计算统计量.如果速度非常重要,则您需要向 density_gen
添加一个使用自己的采样器的 _rvs
方法.这方面的一个例子是我自己的 DensityInversionSampler
,当仅给定 PDF 和采样域时,它通过数值反演生成随机数.
However, rvs
, mean
, and so on will then be very slow, presumably because the method needs to integrate the PDF every time it needs to generate a random number or calculate a statistic. If speed is at a premium, you will thus need to add to density_gen
an _rvs
method that uses its own sampler. An example of this is my own DensityInversionSampler
, which generates random numbers by numerical inversion, when given only the PDF and the sampling domain.
这篇关于Python scipy rv_continuous 实现的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!