使用 scipy truncnorm 拟合数据 [英] Fitting data using scipy truncnorm

查看:49
本文介绍了使用 scipy truncnorm 拟合数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有遵循高斯分布的数据.但是,数据仅对于一系列值 [xa,xb] 才是真正的高斯分布,因此我想使用

注意:偶尔会生成一个随机数据集,其中 fmin_slsqp 在计算过程中失败并出现遇到无效值".我没有对此进行进一步调查,但您的数据可能会遇到此问题.

I have data that follow a Gaussian distribution. However, the data is truly Gaussian only for a range of values [xa,xb] so I want to fit a truncated normal distribution using scipy.stats.truncnorm while using the fact that I know the range [xa,xb]. My goal is to find loc and scale.

I don't understand how to fix xa and xb in the fit. The shape parameters are 'a' and 'b', but those depend on loc and scale, which are my unknowns. Moreover, it doesn't seem to be possible to put an initial guess on 'a' and 'b' (they can only be frozen with fa and fb?). When I do:

par = truncnorm.fit(r, a=a_guess, b=b_guess, scale= scale_guess, loc = loc_guess)

I get

Unknown arguments: {'a': 0.0, 'b': 2.4444444444444446}.

Also, the fits I get are very unstable. Here's a example:

from scipy.stats import truncnorm
import matplotlib.pyplot as plt

xa, xb = 30,250 
loc, loc_guess = 50, 30
scale, scale_guess = 75, 90
a,b = (xa-loc)/scale, (xb-loc)/scale

fig, ax = plt.subplots(1, 1)
x = np.linspace(xa,xb,10000)    
ax.plot(x, truncnorm.pdf(x, a, b, loc=loc, scale=scale),
        'r-', lw=5, alpha=0.6, label='truncnorm pdf')

r = truncnorm.rvs(a, b, loc=loc, scale=scale, size=10000)
par = truncnorm.fit(r, scale= scale_guess, loc = loc_guess)
ax.plot(x, truncnorm.pdf(x, *par),
        'b-', lw=1, alpha=0.6, label='truncnorm fit')
ax.hist(r, density=True, histtype='stepfilled', alpha=0.3)
plt.legend()
plt.show()

1st example 2nd example

I also often have this warning:

/home/elie/anaconda2/envs/py36/lib/python3.6/site-packages/scipy/stats/_continuous_distns.py:5823: RuntimeWarning: divide by zero encountered in log self._logdelta = np.log(self._delta)

解决方案

As you have discovered, the problem is that the parameters that you want to keep fixed, xa and xb, are not native parameters of truncnorm. truncnorm has the shape parameters a and b, which determine the shape by setting the x-interval for the standard normal distribution. This shape is then shifted and scaled by the loc and scale parameters. The relation is

xa = a*scale + loc
xb = b*scale + loc

To fix xa and xb, you can use one of the SciPy minimizers that accepts equality constraints. Here I'll use scipy.optimize.fmin_slsqp. (You could instead use the "omnibus" function scipy.optmize.minimize, which includes the SLSQP solver as one of its options.)

Here's a script that demonstrate how to use fmin_slsqp for this problem. The function func is the objective function to be minimized. It is just a wrapper for truncnorm.nnlf, the negative log-likelihood function. The function constraint returns an array containing two values. These values are 0 when the constraint is satisfied.

import numpy as np
from scipy.stats import truncnorm
from scipy.optimize import fmin_slsqp

import matplotlib.pyplot as plt


def func(p, r, xa, xb):
    return truncnorm.nnlf(p, r)


def constraint(p, r, xa, xb):
    a, b, loc, scale = p
    return np.array([a*scale + loc - xa, b*scale + loc - xb])


xa, xb = 30, 250 
loc = 50
scale = 75

a = (xa - loc)/scale
b = (xb - loc)/scale

# Generate some data to work with.
r = truncnorm.rvs(a, b, loc=loc, scale=scale, size=10000)

loc_guess = 30
scale_guess = 90
a_guess = (xa - loc_guess)/scale_guess
b_guess = (xb - loc_guess)/scale_guess
p0 = [a_guess, b_guess, loc_guess, scale_guess]

par = fmin_slsqp(func, p0, f_eqcons=constraint, args=(r, xa, xb),
                 iprint=False, iter=1000)

xmin = 0
xmax = 300
x = np.linspace(xmin, xmax, 1000)

fig, ax = plt.subplots(1, 1)
ax.plot(x, truncnorm.pdf(x, a, b, loc=loc, scale=scale),
        'r-', lw=3, alpha=0.4, label='truncnorm pdf')
ax.plot(x, truncnorm.pdf(x, *par),
        'k--', lw=1, alpha=1.0, label='truncnorm fit')
ax.hist(r, bins=15, density=True, histtype='stepfilled', alpha=0.3)
ax.legend(shadow=True)
plt.xlim(xmin, xmax)
plt.grid(True)

plt.show()

Here's the plot that it generates. The sample data is random, so the plot will be different on each run.

Note: occasionally a random data set is generated for which fmin_slsqp fails with an "invalid value encountered" during a calculation. I haven't investigated this further, but you might run into this with your data.

这篇关于使用 scipy truncnorm 拟合数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆