如何在Python curve_fit中返回拟合错误 [英] How to return the fit error in Python curve_fit

查看:84
本文介绍了如何在Python curve_fit中返回拟合错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python使函数适合实验的数据集.我可以得到一个非常好的近似值,拟合度看起来还不错,但是给定的参数误差非常高,我不确定如何解决此问题.

该函数如下所示:

I'm trying to fit function to a data set of an experiment using python. I can get a really good approximation and the fit looks pretty good, but the error given for the parameters is incredibly high and I'm not sure how to fix this.

The function looks like this: Function

The data consist of the a time data set and a y data set. The variable "ve" is a linear velocity function, that's why in the code it is replaced with "a*x+b". Now the fit looks really good and theoretically the function should fit the data, but the error is crazily high. The code is the following:

import operator
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from lmfit import Model
from numpy import log, linspace, random
from matplotlib import colors as mcolors
from scipy.optimize import curve_fit

data6 = pd.read_csv('2594.csv')
x=data6.iloc[:18,0]
y=data6.iloc[:18,1]

def func(x, a, b, mo, q):
    return (4.9+a*x+b)*x+(a*x+b)*((mo/q)-x)*log(1-(q*x)/mo)-0.5*9.8*x*x

popt, pcov = curve_fit(func,x,y,bounds=((0, -100, 0, 0), (1000, 1000, 1, 1)))
plt.plot(x, func(x, *popt), 'g--', label='fit: a=%5.3f, b=%5.3f, mo=%5.3f, 
q=%5.3f' % tuple(popt))
plt.plot(x,y,':', label='Daten')
plt.grid(True)
plt.legend(loc='upper left')
plt.xlabel("t [s]")
plt.ylabel("z [m]")
plt.title('Anpassung vor Zeitpunkt T', )
plt.savefig('fit1.pdf')
plt.show()

Here is the fit for this line of code: Fit1

and the covariance Matrix:

[[ 3.66248820e+09  2.88800781e+09 -5.59803683e+06 -4.01121935e+05]
 [ 2.88800781e+09  2.27731332e+09 -4.44058731e+06 -3.17108449e+05]
 [-5.59803683e+06 -4.44058731e+06  2.43805434e+05  7.83731345e+03]
 [-4.01121935e+05 -3.17108449e+05  7.83731345e+03  2.65778118e+02]]

I also tried the following fit mode but I become errors of over 1400%:

fmodel = Model(func)
result = fmodel.fit(y, x=x, a=14, b=3.9, mo=0.8, q=0.002)

This is the fit report:

a:   926.607518 +/- 182751.047 (19722.59%) (init = 14)
b:   737.755741 +/- 143994.520 (19517.91%) (init = 3.9)
mo:  0.27745681 +/- 27.5360933 (9924.46%) (init = 0.8)
q:   0.00447098 +/- 0.60437392 (13517.72%) (init = 0.002)

And this is the resulting fit: Fit2 I would really appreciate some help. If possible a simple guide on how to minimize the error of the function!

The data looks like this:

x=[0.0333 0.0667 0.1    0.133  0.167  0.2    0.233  0.267  0.3    0.333  
   0.367  0.4    0.433  0.467  0.5    0.533  0.567  0.6   ]
y=[0.104 0.249 0.422 0.6   0.791 1.    1.23  1.47  1.74  2.02  2.33  2.64
   2.99  3.34  3.71  4.08  4.47  4.85 ]

Thank you!

解决方案

If you had printed out the full fit report from lmfit (or properly untangled to components of the covariance matrix from curve_fit) you would see that the parameters a and b are 100% correlated.

Basically, this is the fitting algorithm telling you that your data is not described well by you model and that you don't need that many parameters (or perhaps these parameters and this model) to describe your data.

Indeed, if you plot the data, there is a gentle slope up. Your function is the sum of three different terms:

   (4.9+a*x+b)*x
 + (a*x+b)*((mo/q)-x)*log(1-x/(mo/q)) 
 - 0.5*9.8*x*x

There are a couple of things to note:

  1. mo and q only appear together, and as mo/q. They will not be independent.
  2. a and b only appear together, and in the same form in multiple places
  3. there are 2 purely quadratic terms in x, one of them hardwired.
  4. the logarithmic term also has a quadratic prefactor. Importantly, the x data does not vary by more than 1 order of magnitude, so that log term won't really vary by much, giving mostly a third quadratic term. (as an aside: if you're taking logs, you should ensure that the argument is actually positive -- log(1-x*a) is asking for trouble)

To summarize: your model is too complicated for your data.

I very much doubt you need to log term at all. As it turns out, you can get a pretty good fit with simple parabolic model:

import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import ParabolicModel

x = np.array([0.0333, 0.0667, 0.1, 0.133, 0.167, 0.2, 0.233 , 0.267, 0.3 ,
          0.333, 0.367, 0.4, 0.433, 0.467, 0.5, 0.533 , 0.567 , 0.6 ])

y = np.array([0.104, 0.249 , 0.422, 0.6, 0.791, 1.0, 1.23, 1.47, 1.74,
          2.02, 2.33, 2.64, 2.99, 3.34, 3.71, 4.08, 4.47, 4.85 ])

qmodel = ParabolicModel()
result = qmodel.fit(y, x=x, a=1, b=2, c=0)
print(result.fit_report())
fitlabel = "fit: a=%5.3f, b=%5.3f, c=%5.3f" % (result.params['a'].value,
                                               result.params['b'].value,
                                               result.params['c'].value)
plt.plot(x, y, label='Daten')
plt.plot(x, result.best_fit, label=fitlabel)
plt.xlabel("t [s]")
plt.ylabel("z [m]")
plt.legend(loc='upper left')
plt.title("Anpassung vor Zeitpunkt T (Model: a*x^2+b*x+c)")
plt.show()

which will give a report of

[[Model]]
    Model(parabolic)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 9
    # data points      = 18
    # variables        = 3
    chi-square         = 0.00298906
    reduced chi-square = 1.9927e-04
    Akaike info crit   = -150.657052
    Bayesian info crit = -147.985936
[[Variables]]
    c: -0.02973853 +/- 0.01120090 (37.66%) (init = 0)
    b:  3.67707491 +/- 0.08142567 (2.21%) (init = 2)
    a:  7.51540814 +/- 0.12492370 (1.66%) (init = 1)
[[Correlations]] (unreported correlations are < 0.100)
    C(b, a) = -0.972
    C(c, b) = -0.891
    C(c, a) =  0.785

and a plot of

这篇关于如何在Python curve_fit中返回拟合错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆