使用多个响应和权重运行lm [英] Run lm with multiple responses and weights

查看:625
本文介绍了使用多个响应和权重运行lm的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须将具有相同模型矩阵的线性模型拟合为多个响应.通过将响应指定为矩阵而不是向量,可以轻松地在R中完成此操作.这样计算速度非常快.

I have to fit a linear model with the same model matrix to multiple responses. This can be easily done in R by specifying the response as matrix instead of a vector. Computation is very fast in this way.

现在,我还要向模型添加与响应的准确性相对应的权重.因此,对于每个响应向量,我还需要不同的权重向量.但是,lm允许仅将权重输入为向量,而不能输入矩阵.因此,我无法批量输入权重,因此必须分别为每个响应运行lm.这样,计算将变得慢得多.

Now I would also like to add weights to the model that correspond to the accuracy of responses. Therefore, for each response vector I would need also different weight vector. However, lm allows to enter the weights only as a vector not matrix. So I could not enter the weights in batch and would have to run lm for every response separately. This way the calculations would become much slower.

有没有办法以批处理模式运行这些类型的模型,而无需反复调用lm?

Is there any way run these type of models in batch mode, without calling lm repeatedly?

推荐答案

现在,我还要向模型添加与响应的准确性相对应的权重.因此,对于每个响应向量,我还需要不同的权重向量.但是,lm仅允许将权重输入为向量,而不能输入矩阵.因此,我无法批量输入权重,因此必须分别为每个响应运行lm.这样,计算将变得慢得多.

Now I would also like to add weights to the model that correspond to the accuracy of responses. Therefore, for each response vector I would need also different weight vector. However, the lm allows to enter the weights only as a vector not matrix. So I could not enter the weights in batch and would have to run lm for every response separately. This way the calculations would become much slower.

适用于具有多个LHS的线性模型中所述,"mlm"的效率要求所有模型都具有一个共享的模型矩阵LHS响应.但是,加权回归不能重用模型矩阵,因为对于一组不同的权重,响应y和模型矩阵X都需要重新缩放.阅读 R:使用weights参数和使用手动加权数据时lm()的结果有所不同,以了解如何加权回归有效.

As explained in Fitting a linear model with multiple LHS, efficiency of "mlm" demands a shared model matrix for all LHS responses. However, weighted regression gives no reuse of model matrix, as for a different set of weights, both response y and model matrix X need be rescaled. Have a read on R: lm() result differs when using weights argument and when using manually reweighted data to see how weighted regression works.

有没有办法以批处理模式运行这些类型的模型,而无需反复调用lm?

这取决于您想要什么.如果需要完整的lmObject,则每次都必须调用lm.如果只需要系数,则可以使用.lm.fit.上面的第二个链接演示了lm.fit的用法,而.lm.fit的用法几乎相同.一个简单的模板可能如下:

It depends on what you want. If you need a full lmObject, then you have to call lm every time. If you only want coefficients, you may use .lm.fit. The 2nd link above has demonstrated the use of lm.fit, while the use of .lm.fit is almost identical. A simple template may be the following:

## weighted mlm, by specifying matrix directly
## `xmat`: non-weighted model matrix, manually created from `model.matrix`
## `ymat`: non-weighted response matrix
## `wmat`: matrix of weights

## all matrices must have the same number of rows (not checked)
## `ymat` and `wmat` must have the same number of columns (not checked)
## no `NA` values in any where is allowed (not checked)
## all elements of `wmat` must be strictly positive (not checked)

wmlm <- function (xmat, ymat, wmat) {
  N <- ncol(ymat)
  wmlmList <- vector("list", length = N)
  for (j in 1:N) {
    rw <- sqrt(wmat[, j])
    wmlmList[[j]] <- .lm.fit(rw * xmat, rw * ymat[, j])
    }
  return(wmlmList)
  }

考虑一个简单的用法示例:

Consider a simple example of its use:

## a toy dataset of 200 data with 3 numerical variables and 1 factor variable
dat <- data.frame(x1 = rnorm(200), x2 = rnorm(200), x3 = rnorm(200), f = gl(5, 40, labels = letters[1:5]))

## consider a model `~ x1 + poly(x3, 3) + x2 * f`
## we construct the non-weighted model matrix
xmat <- model.matrix (~ x1 + poly(x3, 3) + x2 * f, dat)

## now let's assume we have 100 model responses as well as 100 sets of weights
ymat <- matrix(rnorm(200 * 100), 200)
wmat <- matrix(runif(200 * 100), 200)

## Let's call `wmlm`:
fit <- wmlm (xmat, ymat, wmat)

.lm.fit返回关键信息以进行进一步的模型推断,而完整的lmObject将继承其中的大部分条目.

.lm.fit returns critical information for further model inference, and a complete lmObject would inherit most of these entries.

## take the first fitted model as an example
str(fit[[1]])
 #$ qr          : num [1:200, 1:14] -10.4116 0.061 0.0828 0.0757 0.0698 ...
 # ..- attr(*, "assign")= int [1:14] 0 1 2 2 2 3 4 4 4 4 ...
 # ..- attr(*, "contrasts")=List of 1
 # .. ..$ f: chr "contr.treatment"
 # ..- attr(*, "dimnames")=List of 2
 # .. ..$ : chr [1:200] "1" "2" "3" "4" ...
 # .. ..$ : chr [1:14] "(Intercept)" "x1" "poly(x3, 3)1" "poly(x3, 3)2" ...
 #$ coefficients: num [1:14] 0.1184 -0.0506 0.3032 0.1643 0.4269 ...
 #$ residuals   : num [1:200] -0.7311 -0.0795 -0.2495 0.4097 0.0495 ...
 #$ effects     : num [1:200] -0.351 -0.36 0.145 0.182 0.291 ...
 #$ rank        : int 14
 #$ pivot       : int [1:14] 1 2 3 4 5 6 7 8 9 10 ...
 #$ qraux       : num [1:14] 1.06 1.13 1.07 1.05 1.01 ...
 #$ tol         : num 1e-07
 #$ pivoted     : logi FALSE

.lm.fit的结果没有受支持的通用函数,例如summaryanovapredictplot等.但是线性模型的推论很容易,因此计算自己很简单(提供你知道背后的理论):

Result of .lm.fit has no supported generic functions like summary, anova, predict, plot, etc. But inference of a linear model is easy, so it is straightforward to compute yourself (provided you know the theory behind):

    回归系数的
  1. t统计表(来自$qr);
  2. F统计和ANOVA表(来自$effects);
  3. 残差标准误差,R平方和调整后的R平方(来自$residulas$rank).
  1. t-statistic table for regression coefficients (from $qr);
  2. F-statistic and ANOVA table (from $effects);
  3. residual standard error, R-squared and adjusted R-squared (from $residulas and $rank).


最后,我提供一个基准:


Finally, I offer a benchmark:

library(microbenchmark)
microbenchmark(wmlm = {wmlm (xmat, ymat, wmat);},
               lm = {for (i in 1:ncol(ymat))
                       lm(ymat[, i] ~ x1 + poly(x3, 3) + x2 * f, dat, weights = wmat[, i]);} )

#Unit: milliseconds
# expr       min        lq      mean    median       uq      max neval cld
# wmlm  20.84512  23.02756  27.29539  24.49314  25.9027  79.8894   100  a 
#   lm 400.53000 405.10622 430.09787 414.42152 442.2640 535.9144   100   b

因此可以看到提高了17.25倍(基于中位数时间).

So 17.25 times boost is seen (based on median time).

这篇关于使用多个响应和权重运行lm的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆