R car :: Anova的更快替代品，用于预测变量子集的平方叉积矩阵求和 [英] Faster alternative to R car::Anova for sum of square crossproduct matrix calculation for subsets of predictors

查看：40 发布时间：2021/5/30 19:42:00 r performance matrix lm manova

本文介绍了R car :: Anova的更快替代品，用于预测变量子集的平方叉积矩阵求和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要在Y(n x q)和X(n x p)的多元线性模型中计算叉积平方和(实际上是该矩阵的迹线).为此的标准R代码是:

I need to compute the sum of squares crossproduct matrix (indeed the trace of this matrix) in a multivariate linear model, with Y (n x q) and X (n x p). Standard R code for doing that is:

require(MASS)
require(car)

# Example data 
q <- 10
n  <- 1000
p <- 10
Y <- mvrnorm(n, mu = rep(0, q), Sigma = diag(q))
X <- as.data.frame(mvrnorm(n, mu = rnorm(p), Sigma = diag(p)))

# Fit lm
fit <- lm( Y ~ ., data = X )

# Type I sums of squares
summary(manova(fit))$SS    

# Type III sums of squares
type = 3 # could be also 2 (II)
car::Anova(fit, type = type)$SSP

必须执行数千次，不幸的是，当预测变量的数量相对较大时，它会变慢.通常，我只对 s 个预测变量的子集感兴趣，因此我尝试重新实现此计算.尽管我的实现直接为 s = 1(以下)翻译线性代数对于较小的样本量(n)更快，

This has to be done thousands of times, unfortunately, it gets slow when the number of predictors is relatively large. As often I am interested only in a subset of s predictors, I tried to re-implement this calculation. Although my implementation directly translating linear algebra for s = 1 (below) is faster for small sample sizes (n),

# Hat matrix (X here stands for the actual design matrix)
H <- tcrossprod(tcrossprod(X, solve(crossprod(X))), X)

# Remove predictor of interest (e.g. 2)
X.r <- X[, -2]  
H1 <- tcrossprod(tcrossprod(X.r, solve(crossprod(X.r))), X.r) 

# Compute e.g. type III sum of squares
SS <- crossprod(Y, H - H1) %*% Y

car 对于较大的n来说，运行速度仍然更快:

car still goes faster for large n:

我已经尝试了 Rcpp 的实现，该实现非常成功，因为R中的这些矩阵产品已经使用了非常有效的代码.

I already tried Rcpp implementation which much success, as these matrix products in R already use a very efficient code.

关于如何更快地执行此操作的任何提示?

Any hint on how to do this faster?

更新

阅读答案后，我尝试了此

After reading the answers, I tried the solution proposed in this post which relies on QR/SVD/Cholesky factorization for hat matrix calculation. However it seems that car::Anova is still faster to compute all p = 30 matrices than me computing just one (s = 1)!! for e.g. n = 5000, q = 10:

Unit: milliseconds
 expr       min        lq      mean    median        uq       max neval
   ME 1137.5692 1202.9888 1257.8979 1251.6834 1318.9282 1398.9343    10
   QR 1005.9082 1031.9911 1084.5594 1037.5659 1095.7449 1364.9508    10
  SVD 1026.8815 1065.4629 1152.6631 1087.9585 1241.4977 1446.8318    10
 Chol  969.9089 1056.3093 1115.9608 1102.1169 1210.7782 1267.1274    10
  CAR  205.1665  211.8523  218.6195  214.6761  222.0973  242.4617    10

更新2

目前最好的解决方案是查看 car::Anova

The best solution for now was to go over the car::Anova code (i.e. functions car:::Anova.III.mlm and subsequently car:::linearHypothesis.mlm) and re-implement them to account for a subset of predictors, instead of all of them.

car 的相关代码如下(我跳过了检查，并简化了一点):

The relevant code by car is as follows (I skipped checks, and simplified a bit):

B <- coef(fit)                    # Model coefficients
M <- model.matrix(fit)            # Model matrix M
V <- solve(crossprod(M))          # M'M
p <- ncol(M)                      # Number of predictors in M
I.p <- diag(p)                    # Identity (p x p)
terms <- labels(terms(fit))       # terms (add intercept)       
terms <- c("(Intercept)", terms)   
n.terms <- length(terms)
assign <- fit$assign              # assignation terms <-> p variables
  
SSP <- as.list(rep(0, n.terms))   # Initialize empty list for sums of squares cross-product matrices
names(SSP) <- terms
  
for (term in 1:n.terms){
    subs <- which(assign == term - 1)
    L <- I.p[subs, , drop = FALSE]
    SSP[[term]] <- t(L %*% B) %*% solve(L %*% V %*% t(L)) %*% (L %*% B)
}

然后，只需选择条款的子集即可.

Then it is just a matter of selecting the subset of terms.

R car :: Anova的更快替代品，用于预测变量子集的平方叉积矩阵求和 [英] Faster alternative to R car::Anova for sum of square crossproduct matrix calculation for subsets of predictors

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R car :: Anova的更快替代品，用于预测变量子集的平方叉积矩阵求和 [英] Faster alternative to R car::Anova for sum of square crossproduct matrix calculation for subsets of predictors

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭