将多项式拟合到数据 [英] Fitting polynomials to data

查看:81
本文介绍了将多项式拟合到数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在给定一组值(x,f(x))的情况下,是否可以找到最适合数据的给定度数的多项式?

Is there a way, given a set of values (x,f(x)), to find the polynomial of a given degree that best fits the data?

我知道多项式插值,该函数用于查找给定n+1数据点,但是这里有很多值,我们想找到一个低次多项式(找到最佳线性拟合,最佳二次方,最佳三次等).它可能与最小二乘 ...

I know polynomial interpolation, which is for finding a polynomial of degree n given n+1 data points, but here there are a large number of values and we want to find a low-degree polynomial (find best linear fit, best quadratic, best cubic, etc.). It might be related to least squares...

更笼统地说,当我们拥有一个多元函数(例如(x,y,f(x,y))之类的点)并且想要在变量中找到给定度数的最佳多项式(p(x,y))时,我想知道答案. (特别是多项式,而不是样条或傅立叶级数.)

More generally, I would like to know the answer when we have a multivariate function -- points like (x,y,f(x,y)), say -- and want to find the best polynomial (p(x,y)) of a given degree in the variables. (Specifically a polynomial, not splines or Fourier series.)

理论和代码/库(最好使用Python,但任何语言都可以)都是有用的.

Both theory and code/libraries (preferably in Python, but any language is okay) would be useful.

推荐答案

感谢大家的答复.这是总结它们的另一种尝试.如果我说太多明显"的事情,请原谅:我以前对最小二乘一无所知,所以一切对我来说都是新的.

Thanks for everyone's replies. Here is another attempt at summarizing them. Pardon if I say too many "obvious" things: I knew nothing about least squares before, so everything was new to me.

多项式插值在给定n+1数据点的情况下拟合度为n的多项式,例如找到正好穿过四个给定点的立方.正如问题中所说,这不是我想要的-我有很多要点,并且想要一个小次数的多项式(除非我们很幸运,它只能近似拟合),但是一些坚持要讨论的答案,我应该提一下:) 拉格朗日多项式范德蒙德矩阵,等等.

Polynomial interpolation is fitting a polynomial of degree n given n+1 data points, e.g. finding a cubic that passes exactly through four given points. As said in the question, this was not want I wanted—I had a lot of points and wanted a small-degree polynomial (which will only approximately fit, unless we've been lucky)—but since some of the answers insisted on talking about it, I should mention them :) Lagrange polynomial, Vandermonde matrix, etc.

最小二乘"是多项式拟合的良好程度"的特定定义/标准/度量". (还有其他方法,但这是最简单的方法.)说您正在尝试拟合多项式 p(x,y)= a + bx + cy + dx 2 + ey 2 + fxy 到某些给定的数据点(x i ,y i ,Z i )(其中"Z i "为问题中的"f(x i ,y i )").对于最小二乘,问题在于找到最佳"系数(a,b,c,d,e,f),以使最小化(保持最小")为残差平方和",即

"Least squares" is a particular definition/criterion/"metric" of "how well" a polynomial fits. (There are others, but this is simplest.) Say you are trying to fit a polynomial p(x,y) = a + bx + cy + dx2 + ey2 + fxy to some given data points (xi,yi,Zi) (where "Zi" was "f(xi,yi)" in the question). With least-squares the problem is to find the "best" coefficients (a,b,c,d,e,f), such that what is minimized (kept "least") is the "sum of squared residuals", namely

S =∑ i (a + bx i + cy i + dx i 2 + ey i 2 + fx i y i -Z i ) 2

S = ∑i (a + bxi + cyi + dxi2 + eyi2 + fxiyi - Zi)2

重要的思想是,如果将S视为(a,b,c,d,e,f)的函数,则S为渐变 a> 为0 .这意味着例如∂ S/∂ f = 0,即

The important idea is that if you look at S as a function of (a,b,c,d,e,f), then S is minimized at a point at which its gradient is 0. This means that for example ∂S/∂f=0, i.e. that

i 2(a +… + fx i y i -Z i )x i y i = 0

i2(a + … + fxiyi - Zi)xiyi = 0

和a,b,c,d,e的类似方程式. 请注意,这些只是a… f中的线性方程.因此,我们可以使用高斯消除或任何

and similar equations for a, b, c, d, e. Note that these are just linear equations in a…f. So we can solve them with Gaussian elimination or any of the usual methods.

这仍然称为线性最小二乘",因为尽管我们想要的函数是二次多项式,但在参数中仍然是线性 (a,b,c,d,e,f ).请注意,当我们希望p(x,y)是任意函数f j 的任何线性组合",而不仅仅是多项式(=单项式的线性组合").

This is still called "linear least squares", because although the function we wanted was a quadratic polynomial, it is still linear in the parameters (a,b,c,d,e,f). Note that the same thing works when we want p(x,y) to be any "linear combination" of arbitrary functions fj, instead of just a polynomial (= "linear combination of monomials").

对于单变量情况(当只有变量x时-f j 是单项式x j ),存在Numpy的

For the univariate case (when there is only variable x — the fj are monomials xj), there is Numpy's polyfit:

>>> import numpy
>>> xs = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> ys = [1.1, 3.9, 11.2, 21.5, 34.8, 51, 70.2, 92.3, 117.4, 145.5]
>>> p = numpy.poly1d(numpy.polyfit(xs, ys, deg=2))
>>> print p
       2
1.517 x + 2.483 x + 0.4927

对于多变量情况,或者通常是线性最小二乘法,存在SciPy. 如其文档中所述,它采用值f j ( x i )的矩阵A. (理论是,它找到A的 Moore-Penrose伪逆.)对于上面涉及(x i ,y i ,Z i )的示例,拟合多项式意味着f j 是单项式x () y ().以下内容可找到最佳二次方(或其他任何度数的最佳多项式,如果您更改度= 2"行):

For the multivariate case, or linear least squares in general, there is SciPy. As explained in its documentation, it takes a matrix A of the values fj(xi). (The theory is that it finds the Moore-Penrose pseudoinverse of A.) With our above example involving (xi,yi,Zi), fitting a polynomial means the fj are the monomials x()y(). The following finds the best quadratic (or best polynomial of any other degree, if you change the "degree = 2" line):

from scipy import linalg
import random

n = 20
x = [100*random.random() for i in range(n)]
y = [100*random.random() for i in range(n)]
Z = [(x[i]+y[i])**2 + 0.01*random.random() for i in range(n)]

degree = 2
A = []
for i in range(n):
    A.append([])
    for xd in range(degree+1):
        for yd in range(degree+1-xd):
            A[i].append((x[i]**xd)*(y[i]**yd)) #f_j(x_i)

c,_,_,_ = linalg.lstsq(A,Z)
j = 0
for xd in range(0,degree+1):
    for yd in range(0,degree+1-xd):
        print " + (%.2f)x^%dy^%d" % (c[j], xd, yd),
        j += 1

打印

 + (0.01)x^0y^0  + (-0.00)x^0y^1  + (1.00)x^0y^2  + (-0.00)x^1y^0  + (2.00)x^1y^1  + (1.00)x^2y^0

因此发现多项式为x 2 + 2xy + y 2 +0.01. [最后一项有时是-0.01,有时是0,由于我们添加的随机噪声,这是可以预期的.]

so it has discovered that the polynomial is x2+2xy+y2+0.01. [The last term is sometimes -0.01 and sometimes 0, which is to be expected because of the random noise we added.]

Python + Numpy/Scipy的替代方法是 R 和计算机代数系统:贤哲,Mathematica,Matlab,Maple.甚至Excel都可以做到. 数字食谱讨论了自己实现该方法的方法(在C,Fortran中).

Alternatives to Python+Numpy/Scipy are R and Computer Algebra Systems: Sage, Mathematica, Matlab, Maple. Even Excel might be able to do it. Numerical Recipes discusses methods to implement it ourselves (in C, Fortran).

  • 如何选择分数会极大地影响它.当我有x=y=range(20)而不是随机点时,它总是产生1.33x 2 + 1.33xy + 1.33y 2 ,这令人费解...直到我意识到因为我一直有x[i]=y[i],所以多项式是相同的:x 2 + 2xy + y 2 = 4x 2 =(4/3 )(x 2 + xy + y 2 ).因此,道德是认真选择点以获取正确的"多项式很重要. (如果可以选择,则应选择 Chebyshev节点用于多项式插值;不确定是否相同)最小二乘也是如此.)
  • 过度拟合:高阶多项式始终可以更好地拟合数据.如果将degree更改为3或4或5,则它仍大部分会识别相同的二次多项式(对于高阶项,系数为0),但对于较大的度数,它将开始拟合较高阶的多项式.但是即使是6级,取较大的n(更多的数据点,而不是20的数据点,比如说200)仍然适合二次多项式.因此,道理是避免过度拟合,因为过度拟合可能有助于获取尽可能多的数据点.
  • 可能存在数值稳定性的问题,我不太了解.
  • 如果不需要多项式,则可以与其他类型的函数更好地拟合,例如样条线(分段多项式).
  • It is strongly influenced by how the points are chosen. When I had x=y=range(20) instead of the random points, it always produced 1.33x2+1.33xy+1.33y2, which was puzzling... until I realised that because I always had x[i]=y[i], the polynomials were the same: x2+2xy+y2 = 4x2 = (4/3)(x2+xy+y2). So the moral is that it is important to choose the points carefully to get the "right" polynomial. (If you can chose, you should choose Chebyshev nodes for polynomial interpolation; not sure if the same is true for least squares as well.)
  • Overfitting: higher-degree polynomials can always fit the data better. If you change the degree to 3 or 4 or 5, it still mostly recognizes the same quadratic polynomial (coefficients are 0 for higher-degree terms) but for larger degrees, it starts fitting higher-degree polynomials. But even with degree 6, taking larger n (more data points instead of 20, say 200) still fits the quadratic polynomial. So the moral is to avoid overfitting, for which it might help to take as many data points as possible.
  • There might be issues of numerical stability I don't fully understand.
  • If you don't need a polynomial, you can obtain better fits with other kinds of functions, e.g. splines (piecewise polynomials).

这篇关于将多项式拟合到数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆