Python-计算有错误的趋势线 [英] Python - calculating trendlines with errors

查看:195
本文介绍了Python-计算有错误的趋势线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我将一些数据存储为两个列表,并使用

So I've got some data stored as two lists, and plotted them using

plot(datasetx, datasety)

然后我设置一条趋势线

trend = polyfit(datasetx, datasety)
trendx = []
trendy = []

for a in range(datasetx[0], (datasetx[-1]+1)):
    trendx.append(a)
    trendy.append(trend[0]*a**2 + trend[1]*a + trend[2])

plot(trendx, trendy)

但是我有第三个数据列表,这是原始数据集中的错误.我可以很好地绘制误差线,但是我不知道该如何使用它,如何在多项式趋势线的系数中找到误差.

But I have a third list of data, which is the error in the original datasety. I'm fine with plotting the errorbars, but what I don't know is using this, how to find the error in the coefficients of the polynomial trendline.

所以说我的趋势线是5x ^ 2 + 3x + 4 = y,那么在5、3和4值上就应该有某种误差.

So say my trendline came out to be 5x^2 + 3x + 4 = y, there needs to be some sort of error on the 5, 3 and 4 values.

有没有使用NumPy的工具可以为我计算出来?

Is there a tool using NumPy that will calculate this for me?

推荐答案

我认为您可以使用scipy.optimize(

I think you can use the function curve_fit of scipy.optimize (documentation). A basic example of the usage:

import numpy as np
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a*x**2 + b*x + c

x = np.linspace(0,4,50)
y = func(x, 5, 3, 4)
yn = y + 0.2*np.random.normal(size=len(x))

popt, pcov = curve_fit(func, x, yn)

根据文档,pcov给出:

Following the documentation, pcov gives:

popt的估计协方差.对角线提供方差 参数估计值.

The estimated covariance of popt. The diagonals provide the variance of the parameter estimate.

因此,您可以通过这种方式计算系数的误差估计.要获得标准偏差,可以取方差的平方根.

So in this way you can calculate an error estimate on the coefficients. To have the standard deviation you can take the square root of the variance.

现在,系数有误差,但这仅基于ydata与拟合之间的偏差.如果您还想解决ydata本身的错误,则curve_fit函数提供sigma参数:

Now you have an error on the coefficients, but it is only based on the deviation between the ydata and the fit. In case you also want to account for an error on the ydata itself, the curve_fit function provides the sigma argument:

sigma:无序列或N长度序列

sigma : None or N-length sequence

如果不是None,则表示ydata的标准偏差.这 向量(如果给定)将用作最小二乘法的权重 问题.

If not None, it represents the standard-deviation of ydata. This vector, if given, will be used as weights in the least-squares problem.

完整示例:

import numpy as np
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a*x**2 + b*x + c

x = np.linspace(0,4,20)
y = func(x, 5, 3, 4)
# generate noisy ydata
yn = y + 0.2 * y * np.random.normal(size=len(x))
# generate error on ydata
y_sigma = 0.2 * y * np.random.normal(size=len(x))

popt, pcov = curve_fit(func, x, yn, sigma = y_sigma)

# plot
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111)
ax.errorbar(x, yn, yerr = y_sigma, fmt = 'o')
ax.plot(x, np.polyval(popt, x), '-')
ax.text(0.5, 100, r"a = {0:.3f} +/- {1:.3f}".format(popt[0], pcov[0,0]**0.5))
ax.text(0.5, 90, r"b = {0:.3f} +/- {1:.3f}".format(popt[1], pcov[1,1]**0.5))
ax.text(0.5, 80, r"c = {0:.3f} +/- {1:.3f}".format(popt[2], pcov[2,2]**0.5))
ax.grid()
plt.show()

还有其他,关于使用numpy数组.使用numpy的主要优点之一是可以避免for循环,因为对数组的操作逐个元素地应用.因此,您的示例中的for循环也可以执行以下操作:

Then something else, about using numpy arrays. One of the main advantages of using numpy is that you can avoid for loops because operations on arrays apply elementwise. So the for-loop in your example can also be done as following:

trendx = arange(datasetx[0], (datasetx[-1]+1))
trendy = trend[0]*trendx**2 + trend[1]*trendx + trend[2]

在这里我使用arange而不是range,因为它返回一个numpy数组而不是列表. 在这种情况下,您还可以使用numpy函数polyval:

Where I use arange instead of range as it returns a numpy array instead of a list. In this case you can also use the numpy function polyval:

trendy = polyval(trend, trendx)

这篇关于Python-计算有错误的趋势线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆