`poly()` 如何生成正交多项式?如何理解“系数"回? [英] How `poly()` generates orthogonal polynomials? How to understand the "coefs" returned?

查看:14
本文介绍了`poly()` 如何生成正交多项式?如何理解“系数"回?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对正交多项式的理解是它们的形式

y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... 达到所需的术语数量

其中 a1a2 etc 是每个正交项的系数(因拟合而异),c1, c2 etc 是正交项内的系数,确定为使项保持正交性(使用相同的 x 值在拟合之间保持一致)

我了解 poly() 用于拟合正交多项式.一个例子

x = c(1.160, 1.143, 1.126, 1.109, 1.079, 1.053, 1.040, 1.027, 1.015, 1.004, 0.994, 0.985, # 0.977 等间距 顺式 0.977y = c(1.217395, 1.604360, 2.834947, 4.585687, 8.770932, 9.996260, 9.264800, 9.155079, 7.949278, 69, 64, 64, 64, 36, 7.36, 7.39# 构造正交多项式orth_poly <- poly(x, degree = 5)# 拟合 y 到正交多项式模型 <- lm(y ~ orth_poly)

我想同时提取系数a1a2 etc,以及正交系数c1c2 .我不知道该怎么做.我的猜测是

model$coefficients

返回第一组系数,但我正在努力解决如何提取其他系数.也许在

attributes(orth_poly)$coefs

?

非常感谢.

解决方案

我刚刚意识到有一个密切相关的问题


第 2 部分:解释 poly()

的输出

让我们考虑一个例子.在你的帖子中使用 x

X <- poly(x, degree = 5)# 1 2 3 4 5# [1,] 0.484259711 0.48436462 0.48074040 0.351250507 0.25411350# [2,] 0.406027697 0.20038942 -0.06236564 -0.303377083 -0.46801416# [3,] 0.327795682 -0.02660187 -0.34049024 -0.338222850 -0.11788140# ... ... ... ... ... ...#[12,] -0.321069852 0.28705108 -0.15397819 -0.006975615 0.16978124#[13,] -0.357884918 0.42236400 -0.40180712 0.398738364 -0.34115435#attr(,系数")#attr(,"coefs")$alpha#[1] 1.054769 1.078794 1.063917 1.075700 1.063079##attr(,"coefs")$norm2#[1] 1.000000e+00 1.300000e+01 4.722031e-02 1.028848e-04 2.550358e-07#[6] 5.567156e-10 1.156628e-12

以下是这些属性:

  • alpha[1] 给出 x_bar = mean(x),即中心;
  • alpha - alpha[1] 给出 alpha0, alpha1, ..., alpha4 (alpha4>alpha5 被计算但在 poly 返回 X 之前被删除,因为它不会在 predict.poly 中使用);
  • norm2 的第一个值总是 1.倒数第二个是 l0, l1, ..., l5,给出X的平方列范数;l0 是删除的 P0(x - x_bar) 的列平方范数,它总是 n(即 length(x));而第一个 1 只是被填充,以便递归在 predict.poly 内进行.
  • beta0, beta1, beta2, ..., beta_5 不返回,但可以计算通过 norm2[-1]/norm2[-length(norm2)].

第 3 部分:使用 QR 分解和递归算法实现 poly

如前所述,poly 不使用递归,而 predict.poly 使用.我个人不明白这种不一致设计背后的逻辑/原因.在这里,我将提供一个自己编写的函数 my_poly,它使用递归来生成矩阵,如果 QR = FALSE.当 QR = TRUE 时,它是一个类似但不完全相同的实现 poly.代码注释得很好,有助于你理解这两种方法.

## 返回数据`x`的模型矩阵my_poly <- 函数(x,度数 = 1,QR = TRUE){##检查可行性如果(长度(唯一(x))<度数)停止(指定程度的唯一数据点不足!")##居中协变量(以便`x`与截距正交)中心 <- 平均值(x)x <- x - 中心如果(QR){##普通多项式设计矩阵的QR分解QR <- qr(outer(x, 0:degree, "^"))## X <- qr.Q(QR) * rep(diag(QR$qr), each = length(x))##即,通过`diag(R)`对Q因子的列重新缩放##也删除拦截X <- qr.qy(QR, diag(diag(QR$qr), length(x), degree + 1))[, -1, drop = FALSE]## 现在 `X` 的列彼此正交## 即,`crossprod(X)` 是对角线X2 <- X * Xnorm2 <- colSums(X * X) ##平方L2范数alpha <- drop(crossprod(X2, x))/norm2beta <- norm2/(c(length(x), norm2[-degree]))列名(X) <- 1:degree}别的 {beta <- alpha <- norm2 <- numeric(degree)## 在所有列上重复第一个多项式 `x` 以初始化设计矩阵 XX <-矩阵(x,nrow = length(x),ncol = degree,dimnames = list(NULL,1:degree))## 计算 alpha[1] 和 beta[1]norm2[1] <- new_norm <- drop(crossprod(x))alpha[1] <- sum(x ^ 3)/new_normbeta[1] <- new_norm/长度(x)如果(度数> 1L){old_norm <- new_norm##第二多项式X[, 2] <- Xi <- (x - alpha[1]) * X[, 1] - beta[1]norm2[2] <- new_norm <- drop(crossprod(Xi))alpha[2] <- drop(crossprod(Xi * Xi, x))/new_normbeta[2] <- new_norm/old_normold_norm <- new_norm##从递归中获得的进一步多项式我<- 3而(我<=度){X[, i] <- Xi <- (x - alpha[i - 1]) * X[, i - 1] - beta[i - 1] * X[, i - 2]norm2[i] <- new_norm <- drop(crossprod(Xi))alpha[i] <- drop(crossprod(Xi * Xi, x))/new_normbeta[i] <- new_norm/old_normold_norm <- new_normi <- i + 1}}}## 列重新缩放,以便 `crossprod(X)` 是一个单位矩阵规模 <- sqrt(norm2)X <- X * rep(1/scale, each = length(x))## 添加属性并返回attr(X, "coefs") <- list(center = center, scale = scale, alpha = alpha[-degree], beta = beta[-degree])X}


第 4 部分:my_poly

的输出说明

X <- my_poly(x, 5, FALSE)

结果矩阵与 poly 生成的矩阵相同,因此被排除在外.属性不一样.

#attr(,系数")#attr(,coefs")$centre#[1] 1.054769#attr(,"coefs")$scale#[1] 2.173023e-01 1.014321e-02 5.050106e-04 2.359482e-05 1.075466e-06#attr(,"coefs")$alpha#[1] 0.024025005 0.009147498 0.020930616 0.008309835#attr(,"coefs")$beta#[1] 0.003632331 0.002178825 0.002478848 0.002182892

my_poly 更明显地返回构造信息:

  • center 给出 x_bar = mean(x);
  • scale 给出列范数(poly 返回的 norm2 的平方根);
  • alpha 给出 alpha1, alpha2, alpha3, alpha4;
  • beta 给出 beta1beta2beta3beta4.

第 5 部分:my_poly

的预测例程

由于 my_poly 返回不同的属性,stats::predict.polymy_poly 不兼容.这是适当的例程my_predict_poly:

## 返回一个线性预测矩阵,给定一个模型矩阵`X`和新数据`x`my_predict_poly <- 函数 (X, x) {##提取施工信息coefs <- attr(X, coefs")中心 <- coefs$centeralpha <- coefs$alphabeta <- coefs$beta度数<- ncol(X)## 居中`x`x <- x - coefs$centre## 在所有列上重复第一个多项式 `x` 以初始化设计矩阵 XX <-矩阵(x,长度(x),度数,dimnames = list(NULL,1:度))如果(度数> 1L){##第二多项式X[, 2] <- (x - alpha[1]) * X[, 1] - beta[1]##从递归中获得的进一步多项式我<- 3而(我<=度){X[, i] <- (x - alpha[i - 1]) * X[, i - 1] - beta[i - 1] * X[, i - 2]i <- i + 1}}## 列重新缩放,以便 `crossprod(X)` 是一个单位矩阵X * rep(1/coefs$scale, each = length(x))}

举个例子:

set.seed(0);x1 <- runif(5, min(x), max(x))

stats:::predict.poly(poly(x, 5), x1)my_predict_poly(my_poly(x, 5, FALSE), x1)

给出完全相同的结果预测矩阵:

# 1 2 3 4 5#[1,] 0.39726381 0.1721267 -0.10562568 -0.3312680 -0.4587345#[2,] -0.13428822 -0.2050351 0.28374304 -0.0858400 -0.2202396#[3,] -0.04450277 -0.3259792 0.16493099 0.2393501 -0.2634766#[4,] 0.12454047 -0.3499992 -0.24270235 0.3411163 0.3891214#[5,] 0.40695739 0.2034296 -0.05758283 -0.2999763 -0.4682834

请注意,预测例程只是采用现有的构造信息,而不是重构多项式.


第 6 节:将 polypredict.poly 视为黑匣子

很少需要了解里面的一切.对于统计建模,知道poly 为模型拟合构造多项式基就足够了,其系数可以在lmObject$coefficients 中找到.在进行预测时,predict.poly 永远不需要被用户调用,因为 predict.lm 会为你做.这样,将polypredict.poly当成一个黑盒子就完全可以了.

My understanding of orthogonal polynomials is that they take the form

y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... up to the number of terms desired

where a1, a2 etc are coefficients to each orthogonal term (vary between fits), and c1, c2 etc are coefficients within the orthogonal terms, determined such that the terms maintain orthogonality (consistent between fits using the same x values)

I understand poly() is used to fit orthogonal polynomials. An example

x = c(1.160, 1.143, 1.126, 1.109, 1.079, 1.053, 1.040, 1.027, 1.015, 1.004, 0.994, 0.985, 0.977) # abscissae not equally spaced

y = c(1.217395, 1.604360, 2.834947, 4.585687, 8.770932, 9.996260, 9.264800, 9.155079, 7.949278, 7.317690, 6.377519, 6.409620, 6.643426)

# construct the orthogonal polynomial
orth_poly <- poly(x, degree = 5)

# fit y to orthogonal polynomial
model <- lm(y ~ orth_poly) 

I would like to extract both the coefficients a1, a2 etc, as well as the orthogonal coefficients c1, c2 etc. I'm not sure how to do this. My guess is that

model$coefficients

returns the first set of coefficients, but I'm struggling with how to extract the others. Perhaps within

attributes(orth_poly)$coefs

?

Many thanks.

解决方案

I have just realized that there was a closely related question Extracting orthogonal polynomial coefficients from R's poly() function? 2 years ago. The answer there is merely explaining what predict.poly does, but my answer gives a complete picture.


Section 1: How does poly represent orthogonal polynomials

My understanding of orthogonal polynomials is that they take the form

y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... up to the number of terms desired

No no, there is no such clean form. poly() generates monic orthogonal polynomials which can be represented by the following recursion algorithm. This is how predict.poly generates linear predictor matrix. Surprisingly, poly itself does not use such recursion but use a brutal force: QR factorization of model matrix of ordinary polynomials for orthogonal span. However, this is equivalent to the recursion.


Section 2: Explanation of the output of poly()

Let's consider an example. Take the x in your post,

X <- poly(x, degree = 5)

#                 1           2           3            4           5
# [1,]  0.484259711  0.48436462  0.48074040  0.351250507  0.25411350
# [2,]  0.406027697  0.20038942 -0.06236564 -0.303377083 -0.46801416
# [3,]  0.327795682 -0.02660187 -0.34049024 -0.338222850 -0.11788140
# ...           ...          ...        ...          ...         ...
#[12,] -0.321069852  0.28705108 -0.15397819 -0.006975615  0.16978124
#[13,] -0.357884918  0.42236400 -0.40180712  0.398738364 -0.34115435
#attr(,"coefs")
#attr(,"coefs")$alpha
#[1] 1.054769 1.078794 1.063917 1.075700 1.063079
# 
#attr(,"coefs")$norm2
#[1] 1.000000e+00 1.300000e+01 4.722031e-02 1.028848e-04 2.550358e-07
#[6] 5.567156e-10 1.156628e-12

Here is what those attributes are:

  • alpha[1] gives the x_bar = mean(x), i.e., the centre;
  • alpha - alpha[1] gives alpha0, alpha1, ..., alpha4 (alpha5 is computed but dropped before poly returns X, as it won't be used in predict.poly);
  • The first value of norm2 is always 1. The second to the last are l0, l1, ..., l5, giving the squared column norm of X; l0 is the column squared norm of the dropped P0(x - x_bar), which is always n (i.e., length(x)); while the first 1 is just padded in order for the recursion to proceed inside predict.poly.
  • beta0, beta1, beta2, ..., beta_5 are not returned, but can be computed by norm2[-1] / norm2[-length(norm2)].

Section 3: Implementing poly using both QR factorization and recursion algorithm

As mentioned earlier, poly does not use recursion, while predict.poly does. Personally I don't understand the logic / reason behind such inconsistent design. Here I would offer a function my_poly written myself that uses recursion to generate the matrix, if QR = FALSE. When QR = TRUE, it is a similar but not identical implementation poly. The code is very well commented, helpful for you to understand both methods.

## return a model matrix for data `x`
my_poly <- function (x, degree = 1, QR = TRUE) {
  ## check feasibility
  if (length(unique(x)) < degree)
    stop("insufficient unique data points for specified degree!")
  ## centring covariates (so that `x` is orthogonal to intercept)
  centre <- mean(x)
  x <- x - centre
  if (QR) {
    ## QR factorization of design matrix of ordinary polynomial
    QR <- qr(outer(x, 0:degree, "^"))
    ## X <- qr.Q(QR) * rep(diag(QR$qr), each = length(x))
    ## i.e., column rescaling of Q factor by `diag(R)`
    ## also drop the intercept
    X <- qr.qy(QR, diag(diag(QR$qr), length(x), degree + 1))[, -1, drop = FALSE]
    ## now columns of `X` are orthorgonal to each other
    ## i.e., `crossprod(X)` is diagonal
    X2 <- X * X
    norm2 <- colSums(X * X)    ## squared L2 norm
    alpha <- drop(crossprod(X2, x)) / norm2
    beta <- norm2 / (c(length(x), norm2[-degree]))
    colnames(X) <- 1:degree
    } 
  else {
    beta <- alpha <- norm2 <- numeric(degree)
    ## repeat first polynomial `x` on all columns to initialize design matrix X
    X <- matrix(x, nrow = length(x), ncol = degree, dimnames = list(NULL, 1:degree))
    ## compute alpha[1] and beta[1]
    norm2[1] <- new_norm <- drop(crossprod(x))
    alpha[1] <- sum(x ^ 3) / new_norm
    beta[1] <- new_norm / length(x)
    if (degree > 1L) {
      old_norm <- new_norm
      ## second polynomial
      X[, 2] <- Xi <- (x - alpha[1]) * X[, 1] - beta[1]
      norm2[2] <- new_norm <- drop(crossprod(Xi))
      alpha[2] <- drop(crossprod(Xi * Xi, x)) / new_norm
      beta[2] <- new_norm / old_norm
      old_norm <- new_norm
      ## further polynomials obtained from recursion
      i <- 3
      while (i <= degree) {
        X[, i] <- Xi <- (x - alpha[i - 1]) * X[, i - 1] - beta[i - 1] * X[, i - 2]
        norm2[i] <- new_norm <- drop(crossprod(Xi))
        alpha[i] <- drop(crossprod(Xi * Xi, x)) / new_norm
        beta[i] <- new_norm / old_norm
        old_norm <- new_norm
        i <- i + 1
        }
      }
    }
  ## column rescaling so that `crossprod(X)` is an identity matrix
  scale <- sqrt(norm2)
  X <- X * rep(1 / scale, each = length(x))
  ## add attributes and return
  attr(X, "coefs") <- list(centre = centre, scale = scale, alpha = alpha[-degree], beta = beta[-degree])
  X
  }


Section 4: Explanation of the output of my_poly

X <- my_poly(x, 5, FALSE)

The resulting matrix is as same as what is generated by poly hence left out. The attributes are not the same.

#attr(,"coefs")
#attr(,"coefs")$centre
#[1] 1.054769

#attr(,"coefs")$scale
#[1] 2.173023e-01 1.014321e-02 5.050106e-04 2.359482e-05 1.075466e-06

#attr(,"coefs")$alpha
#[1] 0.024025005 0.009147498 0.020930616 0.008309835

#attr(,"coefs")$beta
#[1] 0.003632331 0.002178825 0.002478848 0.002182892

my_poly returns construction information more apparently:

  • centre gives x_bar = mean(x);
  • scale gives column norms (the square root of norm2 returned by poly);
  • alpha gives alpha1, alpha2, alpha3, alpha4;
  • beta gives beta1, beta2, beta3, beta4.

Section 5: Prediction routine for my_poly

Since my_poly returns different attributes, stats:::predict.poly is not compatible with my_poly. Here is the appropriate routine my_predict_poly:

## return a linear predictor matrix, given a model matrix `X` and new data `x`
my_predict_poly <- function (X, x) {
  ## extract construction info
  coefs <- attr(X, "coefs")
  centre <- coefs$centre
  alpha <- coefs$alpha
  beta <- coefs$beta
  degree <- ncol(X)
  ## centring `x`
  x <- x - coefs$centre
  ## repeat first polynomial `x` on all columns to initialize design matrix X
  X <- matrix(x, length(x), degree, dimnames = list(NULL, 1:degree))
  if (degree > 1L) {
    ## second polynomial
    X[, 2] <- (x - alpha[1]) * X[, 1] - beta[1]
    ## further polynomials obtained from recursion
    i <- 3
    while (i <= degree) {
      X[, i] <- (x - alpha[i - 1]) * X[, i - 1] - beta[i - 1] * X[, i - 2]
      i <- i + 1
      }
    }
  ## column rescaling so that `crossprod(X)` is an identity matrix
  X * rep(1 / coefs$scale, each = length(x))
  }

Consider an example:

set.seed(0); x1 <- runif(5, min(x), max(x))

and

stats:::predict.poly(poly(x, 5), x1)
my_predict_poly(my_poly(x, 5, FALSE), x1)

give exactly the same result predictor matrix:

#               1          2           3          4          5
#[1,]  0.39726381  0.1721267 -0.10562568 -0.3312680 -0.4587345
#[2,] -0.13428822 -0.2050351  0.28374304 -0.0858400 -0.2202396
#[3,] -0.04450277 -0.3259792  0.16493099  0.2393501 -0.2634766
#[4,]  0.12454047 -0.3499992 -0.24270235  0.3411163  0.3891214
#[5,]  0.40695739  0.2034296 -0.05758283 -0.2999763 -0.4682834

Be aware that prediction routine simply takes the existing construction information rather than reconstructing polynomials.


Section 6: Just treat poly and predict.poly as a black box

There is rarely the need to understand everything inside. For statistical modelling it is sufficient to know that poly constructs polynomial basis for model fitting, whose coefficients can be found in lmObject$coefficients. When making prediction, predict.poly never needs be called by user since predict.lm will do it for you. In this way, it is absolutely OK to just treat poly and predict.poly as a black box.

这篇关于`poly()` 如何生成正交多项式?如何理解“系数"回?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆