如何计算科学曲线拟合的可能性? [英] How to calculate the likelihood of curve-fitting in scipy?

查看:70
本文介绍了如何计算科学曲线拟合的可能性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的非线性模型:

I have a nonlinear model fit that looks like this:

黑色实线是模型拟合,灰色部分是原始数据.

The dark solid line is the model fit, and the grey part is the raw data.

问题的简短版本:如何获得该模型拟合的可能性,以便执行对数似然比检验?假设残差是正态分布的.

Short version of the question: how do I get the likelihood of this model fit, so I can perform log-likelihood ratio test? Assume that the residual is normally distributed.

我是统计学的新手,我目前的想法是:

I am relatively new to statistics, and my current thoughts are:

  1. 从曲线拟合中获得残差,并计算残差的方差;

  1. Get the residual from the curve fit, and calculate the variance of residual;

使用此等式 并将残差方差插入到sigma-squared,以x_i作为实验,以mu作为模型拟合;

Use this equation And plug in the variance of residual into sigma-squared, x_i as experiment and mu as model fit;

计算对数似然比.

有人对这两个完整版本的问题有帮助吗?

Could anyone help me, with these two full-version questions?

  1. 我的方法正确吗? (我认为是这样,但是确保它确实很棒!)

  1. Is my method correct? (I think so, but it would be really great to make sure!)

python/scipy/statsmodels中是否有现成的函数可以帮我做到这一点?

Are there any ready-made functions in python/scipy/statsmodels to do this for me?

推荐答案

您的似然函数

简单来说就是高斯分布的概率密度函数的对数之和.

which is simply the sum of log of probability density function of Gaussian distribution.

为残基拟合mu和sigma的可能性,而不是根据您的数据的模型的可能性.一句话,您的方法是错误.

is the likelihood of fitting a mu and a sigma for your residue, not the likelihood of your model given your data. In one word, your approach is wrong.

假设您正在执行非线性最小二乘,请按照已经提到的@usethedeathstar,直接进行F-test运算. .考虑以下示例,该示例是从 http://www.walkingrandomly.com/?p=5254,然后我们使用R进行F-test.最后,我们将讨论如何将其转换为python.

Sine you are doing non-linear least square, following what @usethedeathstar already mentioned, you should go straight for F-test. . Consider the following example, modified from http://www.walkingrandomly.com/?p=5254, and we conduct F-test using R. And we will discuss how to translate it into python in the end.

# construct the data vectors using c()
> xdata = c(-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9)
> ydata = c(0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001)
# some starting values
> p1 = 1
> p2 = 0.2
> p3 = 0.01

# do the fit
> fit1 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata), start=list(p1=p1,p2=p2))
> fit2 = nls(ydata ~ p1*cos(p2*xdata) + p2*sin(p1*xdata)+p3*xdata, start=list(p1=p1,p2=p2,p3=p3))

# summarise
> summary(fit1)

Formula: ydata ~ p1 * cos(p2 * xdata) + p2 * sin(p1 * xdata)

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
p1 1.881851   0.027430   68.61 2.27e-12 ***
p2 0.700230   0.009153   76.51 9.50e-13 ***
---
Signif. codes:  0 ?**?0.001 ?*?0.01 ??0.05 ??0.1 ??1

Residual standard error: 0.08202 on 8 degrees of freedom

Number of iterations to convergence: 7 
Achieved convergence tolerance: 2.189e-06

> summary(fit2)

Formula: ydata ~ p1 * cos(p2 * xdata) + p2 * sin(p1 * xdata) + p3 * xdata

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
p1  1.90108    0.03520  54.002 1.96e-10 ***
p2  0.70657    0.01167  60.528 8.82e-11 ***
p3  0.02029    0.02166   0.937     0.38    
---
Signif. codes:  0 ?**?0.001 ?*?0.01 ??0.05 ??0.1 ??1

Residual standard error: 0.08243 on 7 degrees of freedom

Number of iterations to convergence: 9 
Achieved convergence tolerance: 2.476e-06

> anova(fit2, fit1)
Analysis of Variance Table

Model 1: ydata ~ p1 * cos(p2 * xdata) + p2 * sin(p1 * xdata) + p3 * xdata
Model 2: ydata ~ p1 * cos(p2 * xdata) + p2 * sin(p1 * xdata)
  Res.Df Res.Sum Sq Df     Sum Sq F value Pr(>F)
1      7   0.047565                             
2      8   0.053813 -1 -0.0062473  0.9194 0.3696

这里有两个模型,fit1有2个参数,因此残基具有8个自由度; fit2具有一个附加参数,且残基具有7个自由度.模型2明显更好吗?不,在(1,7)自由度上,F值为0.9194,并不重要.

here we have two model, fit1 has 2 parameters, therefore the residue has 8 degrees-of-freedom; fit2 has one additional parameter and the residue has 7 degrees of freedom. Is model 2 significantly better? No, the F value is 0.9194, on (1,7) degrees of freedom and it is not significant.

获得ANOVA表:残留DF很简单.残差平方和:0.08202*0.08202*8=0.053810.08243*0.08243*7=0.04756293(注意:残差标准误差:7个自由度上的0.08243" 等).在python中,您可以通过(y_observed-y_fitted)**2来获得它,因为scipy.optimize.curve_fit()不会返回残基.

To get the ANOVA table: Residue DF is easy. Residue Sum of squares: 0.08202*0.08202*8=0.05381 and 0.08243*0.08243*7=0.04756293 (notice: 'Residual standard error: 0.08243 on 7 degrees of freedom', etc). In python, you can get it by (y_observed-y_fitted)**2, since scipy.optimize.curve_fit() doesn't return the residues.

F-ratio0.0062473/0.047565*7,要获得P值:1-scipy.stats.f.cdf(0.9194, 1, 7).

将它们放在一起,我们就有python个等效项:

Put them together we have python equivalent:

In [1]:

import scipy.optimize as so
import scipy.stats as ss
xdata = np.array([-2,-1.64,-1.33,-0.7,0,0.45,1.2,1.64,2.32,2.9])
ydata = np.array([0.699369,0.700462,0.695354,1.03905,1.97389,2.41143,1.91091,0.919576,-0.730975,-1.42001])
def model0(x,p1,p2):
    return p1*np.cos(p2*x) + p2*np.sin(p1*x)
def model1(x,p1,p2,p3):
    return p1*np.cos(p2*x) + p2*np.sin(p1*x)+p3*x
p1, p2, p3 = 1, 0.2, 0.01
fit0=so.curve_fit(model0, xdata, ydata, p0=(p1,p2))[0]
fit1=so.curve_fit(model1, xdata, ydata, p0=(p1,p2,p3))[0]
yfit0=model0(xdata, fit0[0], fit0[1])
yfit1=model1(xdata, fit1[0], fit1[1], fit1[2])
ssq0=((yfit0-ydata)**2).sum()
ssq1=((yfit1-ydata)**2).sum()
df=len(xdata)-3
f_ratio=(ssq0-ssq1)/(ssq1/df)
p=1-ss.f.cdf(f_ratio, 1, df)
In [2]:

print f_ratio, p
0.919387419515 0.369574503394

就像@usethedeathstar指出的那样:当残差呈正态分布时,非线性最小二乘法 IS 的可能性最大.因此,F检验和似然比检验是等效的.因为 F比率是似然比λ的单调变换.

As @usethedeathstar pointed out: when you the residue is normally distributed, nonlinear least square IS the maximum likelihood. Therefore F-test and likelihood ratio test is equivalent. Because, F-ratio is a monotone transformation of the likelihood ratio λ.

或以描述性方式,请参见: http://www.stata.com/support/faqs/statistics/chi-squared-and-f-distributions/

Or in a descriptive way, see: http://www.stata.com/support/faqs/statistics/chi-squared-and-f-distributions/

这篇关于如何计算科学曲线拟合的可能性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆