多项式回归无意义预测 [英] Polynomial Regression nonsense Predictions

查看:76
本文介绍了多项式回归无意义预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我想用二阶(正交)多项式拟合线性回归模型,然后预测响应.这是第一个模型(m1)的代码

Suppose I want to fit a linear regression model with degree two (orthogonal) polynomial and then predict the response. Here are the codes for the first model (m1)

x=1:100
y=-2+3*x-5*x^2+rnorm(100)
m1=lm(y~poly(x,2))
prd.1=predict(m1,newdata=data.frame(x=105:110))

现在让我们尝试相同的模型,但不使用 $poly(x,2)$,我将使用其列,例如:

Now let's try the same model but instead of using $poly(x,2)$, I will use its columns like:

m2=lm(y~poly(x,2)[,1]+poly(x,2)[,2])
prd.2=predict(m2,newdata=data.frame(x=105:110))

我们来看看m1和m2的总结.

Let's look at the summaries of m1 and m2.

> summary(m1)

Call:
lm(formula = y ~ poly(x, 2))

Residuals:
     Min       1Q   Median       3Q      Max 
-2.50347 -0.48752 -0.07085  0.53624  2.96516 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -1.677e+04  9.912e-02 -169168   <2e-16 ***
poly(x, 2)1 -1.449e+05  9.912e-01 -146195   <2e-16 ***
poly(x, 2)2 -3.726e+04  9.912e-01  -37588   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.9912 on 97 degrees of freedom
Multiple R-squared:     1,      Adjusted R-squared:     1 
F-statistic: 1.139e+10 on 2 and 97 DF,  p-value: < 2.2e-16 

> summary(m2)

Call:
lm(formula = y ~ poly(x, 2)[, 1] + poly(x, 2)[, 2])

Residuals:
     Min       1Q   Median       3Q      Max 
-2.50347 -0.48752 -0.07085  0.53624  2.96516 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -1.677e+04  9.912e-02 -169168   <2e-16 ***
poly(x, 2)[, 1] -1.449e+05  9.912e-01 -146195   <2e-16 ***
poly(x, 2)[, 2] -3.726e+04  9.912e-01  -37588   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.9912 on 97 degrees of freedom
Multiple R-squared:     1,      Adjusted R-squared:     1 
F-statistic: 1.139e+10 on 2 and 97 DF,  p-value: < 2.2e-16 

所以 m1 和 m2 基本相同.现在让我们看看预测prd.1和prd.2

So m1 and m2 are basically the same. Now let's look at the predictions prd.1 and prd.2

> prd.1
        1         2         3         4         5         6 
-54811.60 -55863.58 -56925.56 -57997.54 -59079.52 -60171.50 

> prd.2
         1          2          3          4          5          6 
  49505.92   39256.72   16812.28  -17827.42  -64662.35 -123692.53 

Q1:为什么prd.2 与prd.1 明显不同?

Q1: Why prd.2 is significantly different from prd.1?

Q2:如何使用模型 m2 获取 prd.1?

Q2: How can I obtain prd.1 using the model m2?

推荐答案

m1 是正确的方法.m2 正在进入一个痛苦的世界...

m1 is the right way to do this. m2 is entering a whole world of pain...

要从 m2 进行预测,模型需要知道它拟合了一组正交的基函数,以便对外推的新数据值使用相同的基函数.比较:poly(1:10,2)[,2]poly(1:12,2)[,2] - 前十个值不一样.如果您使用 poly(x,2) 显式拟合模型,则 predict 会理解所有这些并做正确的事情.

To do predictions from m2, the model needs to know it was fitted to an orthogonal set of basis functions, so that it uses the same basis functions for the extrapolated new data values. Compare: poly(1:10,2)[,2] with poly(1:12,2)[,2] - the first ten values are not the same. If you fit the model explicitly with poly(x,2) then predict understands all that and does the right thing.

您需要做的是确保使用与最初创建模型时相同的一组基函数来转换您的预测位置.您可以为此使用 predict.poly (注意,我称我的解释变量为 x1x2 以便很容易匹配名称):

What you have to do is make sure your predicted locations are transformed using the same set of basis functions as used to create the model in the first place. You can use predict.poly for this (note I call my explanatory variables x1 and x2 so that its easy to match the names up):

px = poly(x,2)
x1 = px[,1]
x2 = px[,2]

m3 = lm(y~x1+x2)

newx = 90:110
pnew = predict(px,newx) # px is the previous poly object, so this calls predict.poly

prd.3 = predict(m3, newdata=data.frame(x1=pnew[,1],x2=pnew[,2]))

这篇关于多项式回归无意义预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆