使用 scipy truncnorm 拟合数据 [英] Fitting data using scipy truncnorm
问题描述
我有遵循高斯分布的数据.但是,数据仅对于一系列值 [xa,xb] 才是真正的高斯分布,因此我想使用
注意:偶尔会生成一个随机数据集,其中 fmin_slsqp
在计算过程中失败并出现遇到无效值".我没有对此进行进一步调查,但您的数据可能会遇到此问题.
I have data that follow a Gaussian distribution. However, the data is truly Gaussian only for a range of values [xa,xb] so I want to fit a truncated normal distribution using scipy.stats.truncnorm while using the fact that I know the range [xa,xb]. My goal is to find loc and scale.
I don't understand how to fix xa and xb in the fit. The shape parameters are 'a' and 'b', but those depend on loc and scale, which are my unknowns. Moreover, it doesn't seem to be possible to put an initial guess on 'a' and 'b' (they can only be frozen with fa and fb?). When I do:
par = truncnorm.fit(r, a=a_guess, b=b_guess, scale= scale_guess, loc = loc_guess)
I get
Unknown arguments: {'a': 0.0, 'b': 2.4444444444444446}.
Also, the fits I get are very unstable. Here's a example:
from scipy.stats import truncnorm
import matplotlib.pyplot as plt
xa, xb = 30,250
loc, loc_guess = 50, 30
scale, scale_guess = 75, 90
a,b = (xa-loc)/scale, (xb-loc)/scale
fig, ax = plt.subplots(1, 1)
x = np.linspace(xa,xb,10000)
ax.plot(x, truncnorm.pdf(x, a, b, loc=loc, scale=scale),
'r-', lw=5, alpha=0.6, label='truncnorm pdf')
r = truncnorm.rvs(a, b, loc=loc, scale=scale, size=10000)
par = truncnorm.fit(r, scale= scale_guess, loc = loc_guess)
ax.plot(x, truncnorm.pdf(x, *par),
'b-', lw=1, alpha=0.6, label='truncnorm fit')
ax.hist(r, density=True, histtype='stepfilled', alpha=0.3)
plt.legend()
plt.show()
I also often have this warning:
/home/elie/anaconda2/envs/py36/lib/python3.6/site-packages/scipy/stats/_continuous_distns.py:5823: RuntimeWarning: divide by zero encountered in log self._logdelta = np.log(self._delta)
As you have discovered, the problem is that the parameters that you want to keep fixed, xa
and xb
, are not native parameters of truncnorm
. truncnorm
has the shape parameters a
and b
, which determine the shape by setting the x-interval for the standard normal distribution. This shape is then shifted and scaled by the loc
and scale
parameters. The relation is
xa = a*scale + loc
xb = b*scale + loc
To fix xa
and xb
, you can use one of the SciPy minimizers that accepts equality constraints. Here I'll use scipy.optimize.fmin_slsqp
. (You could instead use the "omnibus" function scipy.optmize.minimize
, which includes the SLSQP solver as one of its options.)
Here's a script that demonstrate how to use fmin_slsqp
for this problem. The function func
is the objective function to be minimized. It is just a wrapper for truncnorm.nnlf
, the negative log-likelihood function. The function constraint
returns an array containing two values. These values are 0 when the constraint is satisfied.
import numpy as np
from scipy.stats import truncnorm
from scipy.optimize import fmin_slsqp
import matplotlib.pyplot as plt
def func(p, r, xa, xb):
return truncnorm.nnlf(p, r)
def constraint(p, r, xa, xb):
a, b, loc, scale = p
return np.array([a*scale + loc - xa, b*scale + loc - xb])
xa, xb = 30, 250
loc = 50
scale = 75
a = (xa - loc)/scale
b = (xb - loc)/scale
# Generate some data to work with.
r = truncnorm.rvs(a, b, loc=loc, scale=scale, size=10000)
loc_guess = 30
scale_guess = 90
a_guess = (xa - loc_guess)/scale_guess
b_guess = (xb - loc_guess)/scale_guess
p0 = [a_guess, b_guess, loc_guess, scale_guess]
par = fmin_slsqp(func, p0, f_eqcons=constraint, args=(r, xa, xb),
iprint=False, iter=1000)
xmin = 0
xmax = 300
x = np.linspace(xmin, xmax, 1000)
fig, ax = plt.subplots(1, 1)
ax.plot(x, truncnorm.pdf(x, a, b, loc=loc, scale=scale),
'r-', lw=3, alpha=0.4, label='truncnorm pdf')
ax.plot(x, truncnorm.pdf(x, *par),
'k--', lw=1, alpha=1.0, label='truncnorm fit')
ax.hist(r, bins=15, density=True, histtype='stepfilled', alpha=0.3)
ax.legend(shadow=True)
plt.xlim(xmin, xmax)
plt.grid(True)
plt.show()
Here's the plot that it generates. The sample data is random, so the plot will be different on each run.
Note: occasionally a random data set is generated for which fmin_slsqp
fails with an "invalid value encountered" during a calculation. I haven't investigated this further, but you might run into this with your data.
这篇关于使用 scipy truncnorm 拟合数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!