线性拟合,包括所有与NumPy/SciPy有关的错误 [英] Linear fit including all errors with NumPy/SciPy

查看:356
本文介绍了线性拟合,包括所有与NumPy/SciPy有关的错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多x-y数据点,并且需要将非线性函数拟合到y上,因此存在y的错误.这些函数在某些情况下可以是线性的,但通常是指数衰减,高斯曲线等. SciPy使用scipy.optimize.curve_fit支持这种拟合,我还可以指定每个点的权重.这给了我很好的加权非线性拟合.从结果中,我可以提取参数及其各自的错误.

I have a lot of x-y data points with errors on y that I need to fit non-linear functions to. Those functions can be linear in some cases, but are more usually exponential decay, gauss curves and so on. SciPy supports this kind of fitting with scipy.optimize.curve_fit, and I can also specify the weight of each point. This gives me weighted non-linear fitting which is great. From the results, I can extract the parameters and their respective errors.

有一个警告:错误仅用作权重,不包括在错误中.如果我将所有数据点上的误差加倍,我希望结果的不确定性也会增加.因此,我构建了一个测试用例(源代码)进行测试.

There is just one caveat: The errors are only used as weights, but not included in the error. If I double the errors on all of my data points, I would expect that the uncertainty of the result increases as well. So I built a test case (source code) to test this.

配合scipy.optimize.curve_fit可以给我:

Parameters: [ 1.99900756  2.99695535]
Errors:     [ 0.00424833  0.00943236]

相同,但具有2 * y_err:

Parameters: [ 1.99900756  2.99695535]
Errors:     [ 0.00424833  0.00943236]

相同,但带有2 * y_err:

Same but with 2 * y_err:

因此您可以看到值是相同的.这告诉我算法没有考虑到这些,但是我认为值应该是不同的.

So you can see that the values are identical. This tells me that the algorithm does not take those into account, but I think the values should be different.

我在这里也了解了另一种拟合方法,因此我也尝试与scipy.odr拟合:

I read about another fit method here as well, so I tried to fit with scipy.odr as well:

Beta: [ 2.00538124  2.95000413]
Beta Std Error: [ 0.00652719  0.03870884]

相同,但具有20 * y_err:

Beta: [ 2.00517894  2.9489472 ]
Beta Std Error: [ 0.00642428  0.03647149]

值略有不同,但是我确实认为这完全可以解释错误的增加.我认为这只是四舍五入错误或权重稍有不同.

The values are slightly different, but I do think that this accounts for the increase in the error at all. I think that this is just rounding errors or a little different weighting.

有没有可以让我拟合数据并得到实际错误的软件包?我有一本书中的公式,但是如果不需要,我不想自己实现.

Is there some package that allows me to fit the data and get the actual errors? I have the formulas here in a book, but I do not want to implement this myself if I do not have to.

我现在在另一个问题中阅读了有关linfit.py的信息.这可以很好地处理我的想法.它支持两种模式,第一种是我需要的.

I have now read about linfit.py in another question. This handles what I have in mind quite well. It supports both modes, and the first one is what I need.

Fit with linfit:
Parameters: [ 2.02600849  2.91759066]
Errors:     [ 0.00772283  0.04449971]

Same but with 20 * y_err:
Parameters: [ 2.02600849  2.91759066]
Errors:     [ 0.15445662  0.88999413]

Fit with linfit(relsigma=True):
Parameters: [ 2.02600849  2.91759066]
Errors:     [ 0.00622595  0.03587451]

Same but with 20 * y_err:
Parameters: [ 2.02600849  2.91759066]
Errors:     [ 0.00622595  0.03587451]


我应该回答我的问题还是立即关闭/删除它?


Should I answer my question or just close/delete it now?

推荐答案

bootstrap方法是一种效果很好并且实际上给出更好结果的方法.当给出有错误的数据点时,将使用参数引导程序,并让每个xy值描述一个高斯分布.然后,将从每个分布中得出一个点,并获得一个新的自举示例.执行简单的未加权拟合将为参数提供一个值.

One way that works well and actually gives a better result is the bootstrap method. When data points with errors are given, one uses a parametric bootstrap and let each x and y value describe a Gaussian distribution. Then one will draw a point from each of those distributions and obtains a new bootstrapped sample. Performing a simple unweighted fit gives one value for the parameters.

此过程重复大约300到数千次.最后将得到拟合参数的分布,在其中可以取均值和标准差来获取值和误差.

This process is repeated some 300 to a couple thousand times. One will end up with a distribution of the fit parameters where one can take mean and standard deviation to obtain value and error.

另一件事是,结果不会获得一条拟合曲线,而是得到许多拟合曲线.对于每个内插的x值,可以再次获取许多值f(x, param)的均值和标准差,并获得误差带:

Another neat thing is that one does not obtain a single fit curve as a result, but lots of them. For each interpolated x value one can again take mean and standard deviation of the many values f(x, param) and obtain an error band:

然后使用各种拟合参数再次执行数百次分析中的其他步骤.从上图中可以清楚地看出,这还将考虑拟合参数的相关性:尽管将对称函数拟合到数据,但误差带是非对称的.这意味着左侧的内插值比右侧的内插值具有更大的不确定性.

Further steps in the analysis are then performed again hundreds of times with the various fit parameters. This will then also take into account the correlation of the fit parameters as one can see clearly in the plot above: Although a symmetric function was fitted to the data, the error band is asymmetric. This will mean that interpolated values on the left have a larger uncertainty than on the right.

这篇关于线性拟合,包括所有与NumPy/SciPy有关的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆