使用scipy.stats使数据符合自定义分布 [英] Fitting data with a custom distribution using scipy.stats
问题描述
所以我注意到scipy
中没有实现倾斜的广义t分布 .对我来说,将其分配到我拥有的某些数据上将很有用.不幸的是fit
在我看来在这种情况下不起作用.为了进一步解释,我已经像这样实现了
So I noticed that there is no implementation of the Skewed generalized t distribution in scipy
. It would be useful for me to fit this is distribution to some data I have. Unfortunately fit
doesn't seem to be working in this case for me. To explain further I have implemented it like so
import numpy as np
import pandas as pd
import scipy.stats as st
from scipy.special import beta
class sgt(st.rv_continuous):
def _pdf(self, x, mu, sigma, lam, p, q):
v = q ** (-1 / p) * \
((3 * lam ** 2 + 1) * (
beta(3 / p, q - 2 / p) / beta(1 / p, q)) - 4 * lam ** 2 *
(beta(2 / p, q - 1 / p) / beta(1 / p, q)) ** 2) ** (-1 / 2)
m = 2 * v * sigma * lam * q ** (1 / p) * beta(2 / p, q - 1 / p) / beta(
1 / p, q)
fx = p / (2 * v * sigma * q ** (1 / p) * beta(1 / p, q) * (
abs(x - mu + m) ** p / (q * (v * sigma) ** p) * (
lam * np.sign(x - mu + m) + 1) ** p + 1) ** (
1 / p + q))
return fx
def _argcheck(self, mu, sigma, lam, p, q):
s = sigma > 0
l = -1 < lam < 1
p_bool = p > 0
q_bool = q > 0
all_bool = s & l & p_bool & q_bool
return all_bool
这一切都很好,我可以生成具有给定参数的随机变量,这没有问题. _argcheck
是必需的,因为简单的肯定参数仅适用于检查.
This all works fine and I can generate random variables with given parameters no problem. The _argcheck
is required as a simple positive params only check is not suitable.
sgt_inst = sgt(name='sgt')
vars = sgt_inst.rvs(mu=1, sigma=3, lam = -0.1, p = 2, q = 50, size = 100)
但是,当我尝试使用fit
这些参数时,我会得到一个错误
However, when I try fit
these parameters I get an error
sgt_inst.fit(vars)
RuntimeWarning:在减法中遇到无效的值
numpy.max(numpy.abs(fsim [0]-fsim [1:]))< = Fatol):
RuntimeWarning: invalid value encountered in subtract
numpy.max(numpy.abs(fsim[0] - fsim[1:])) <= fatol):
它只是返回
我感到奇怪的是,当我实现示例自定义高斯分布时,如文档,运行fit
方法没有问题.
What I find strange is that when I implement the example custom Gaussian distribution as shown in the docs, it has no problem running the fit
method.
有什么想法吗?
推荐答案
As fit
docstring says,
拟合的起始估计值由输入参数给出;对于未提供初始估计的任何参数,将调用
self._fitstart(data)
来生成这样的参数.
Starting estimates for the fit are given by input arguments; for any arguments not provided with starting estimates,
self._fitstart(data)
is called to generate such.
调用sgt_inst._fitstart(data)
返回(1.0, 1.0, 1.0, 1.0, 1.0, 0, 1)
(前五个是形状参数,后两个是loc和scale).看起来_fitstart
不是一个复杂的过程.它选择的参数l
不满足您的argcheck要求.
Calling sgt_inst._fitstart(data)
returns (1.0, 1.0, 1.0, 1.0, 1.0, 0, 1)
(the first five are shape parameters, the last two are loc and scale). Looks like _fitstart
is not a sophisticated process. The parameter l
it picks does not meet your argcheck requirement.
结论:为fit
提供您自己的起始参数,例如
Conclusion: provide your own starting parameters for fit
, e.g.,
sgt_inst.fit(data, 0.5, 0.5, -0.5, 2, 10)
为我的随机数据返回(1.4587093459289049, 5.471769032259468, -0.02391466905874927, 7.07289326147152
4, 0.741434497805832, -0.07012808188413872, 0.5308181287869771)
.
returns (1.4587093459289049, 5.471769032259468, -0.02391466905874927, 7.07289326147152
4, 0.741434497805832, -0.07012808188413872, 0.5308181287869771)
for my random data.
这篇关于使用scipy.stats使数据符合自定义分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!