使用多个响应和权重运行lm [英] Run lm with multiple responses and weights
问题描述
我必须将具有相同模型矩阵的线性模型拟合为多个响应.通过将响应指定为矩阵而不是向量,可以轻松地在R中完成此操作.这样计算速度非常快.
I have to fit a linear model with the same model matrix to multiple responses. This can be easily done in R by specifying the response as matrix instead of a vector. Computation is very fast in this way.
现在,我还要向模型添加与响应的准确性相对应的权重.因此,对于每个响应向量,我还需要不同的权重向量.但是,lm
允许仅将权重输入为向量,而不能输入矩阵.因此,我无法批量输入权重,因此必须分别为每个响应运行lm
.这样,计算将变得慢得多.
Now I would also like to add weights to the model that correspond to the accuracy of responses. Therefore, for each response vector I would need also different weight vector. However, lm
allows to enter the weights only as a vector not matrix. So I could not enter the weights in batch and would have to run lm
for every response separately. This way the calculations would become much slower.
有没有办法以批处理模式运行这些类型的模型,而无需反复调用lm
?
Is there any way run these type of models in batch mode, without calling lm
repeatedly?
推荐答案
现在,我还要向模型添加与响应的准确性相对应的权重.因此,对于每个响应向量,我还需要不同的权重向量.但是,
lm
仅允许将权重输入为向量,而不能输入矩阵.因此,我无法批量输入权重,因此必须分别为每个响应运行lm
.这样,计算将变得慢得多.
Now I would also like to add weights to the model that correspond to the accuracy of responses. Therefore, for each response vector I would need also different weight vector. However, the
lm
allows to enter the weights only as a vector not matrix. So I could not enter the weights in batch and would have to runlm
for every response separately. This way the calculations would become much slower.
如适用于具有多个LHS的线性模型中所述,"mlm"的效率要求所有模型都具有一个共享的模型矩阵LHS响应.但是,加权回归不能重用模型矩阵,因为对于一组不同的权重,响应y
和模型矩阵X
都需要重新缩放.阅读 R:使用weights
参数和使用手动加权数据时lm()的结果有所不同,以了解如何加权回归有效.
As explained in Fitting a linear model with multiple LHS, efficiency of "mlm" demands a shared model matrix for all LHS responses. However, weighted regression gives no reuse of model matrix, as for a different set of weights, both response y
and model matrix X
need be rescaled. Have a read on R: lm() result differs when using weights
argument and when using manually reweighted data to see how weighted regression works.
有没有办法以批处理模式运行这些类型的模型,而无需反复调用
lm
?
这取决于您想要什么.如果需要完整的lmObject
,则每次都必须调用lm
.如果只需要系数,则可以使用.lm.fit
.上面的第二个链接演示了lm.fit
的用法,而.lm.fit
的用法几乎相同.一个简单的模板可能如下:
It depends on what you want. If you need a full lmObject
, then you have to call lm
every time. If you only want coefficients, you may use .lm.fit
. The 2nd link above has demonstrated the use of lm.fit
, while the use of .lm.fit
is almost identical. A simple template may be the following:
## weighted mlm, by specifying matrix directly
## `xmat`: non-weighted model matrix, manually created from `model.matrix`
## `ymat`: non-weighted response matrix
## `wmat`: matrix of weights
## all matrices must have the same number of rows (not checked)
## `ymat` and `wmat` must have the same number of columns (not checked)
## no `NA` values in any where is allowed (not checked)
## all elements of `wmat` must be strictly positive (not checked)
wmlm <- function (xmat, ymat, wmat) {
N <- ncol(ymat)
wmlmList <- vector("list", length = N)
for (j in 1:N) {
rw <- sqrt(wmat[, j])
wmlmList[[j]] <- .lm.fit(rw * xmat, rw * ymat[, j])
}
return(wmlmList)
}
考虑一个简单的用法示例:
Consider a simple example of its use:
## a toy dataset of 200 data with 3 numerical variables and 1 factor variable
dat <- data.frame(x1 = rnorm(200), x2 = rnorm(200), x3 = rnorm(200), f = gl(5, 40, labels = letters[1:5]))
## consider a model `~ x1 + poly(x3, 3) + x2 * f`
## we construct the non-weighted model matrix
xmat <- model.matrix (~ x1 + poly(x3, 3) + x2 * f, dat)
## now let's assume we have 100 model responses as well as 100 sets of weights
ymat <- matrix(rnorm(200 * 100), 200)
wmat <- matrix(runif(200 * 100), 200)
## Let's call `wmlm`:
fit <- wmlm (xmat, ymat, wmat)
.lm.fit
返回关键信息以进行进一步的模型推断,而完整的lmObject
将继承其中的大部分条目.
.lm.fit
returns critical information for further model inference, and a complete lmObject
would inherit most of these entries.
## take the first fitted model as an example
str(fit[[1]])
#$ qr : num [1:200, 1:14] -10.4116 0.061 0.0828 0.0757 0.0698 ...
# ..- attr(*, "assign")= int [1:14] 0 1 2 2 2 3 4 4 4 4 ...
# ..- attr(*, "contrasts")=List of 1
# .. ..$ f: chr "contr.treatment"
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : chr [1:200] "1" "2" "3" "4" ...
# .. ..$ : chr [1:14] "(Intercept)" "x1" "poly(x3, 3)1" "poly(x3, 3)2" ...
#$ coefficients: num [1:14] 0.1184 -0.0506 0.3032 0.1643 0.4269 ...
#$ residuals : num [1:200] -0.7311 -0.0795 -0.2495 0.4097 0.0495 ...
#$ effects : num [1:200] -0.351 -0.36 0.145 0.182 0.291 ...
#$ rank : int 14
#$ pivot : int [1:14] 1 2 3 4 5 6 7 8 9 10 ...
#$ qraux : num [1:14] 1.06 1.13 1.07 1.05 1.01 ...
#$ tol : num 1e-07
#$ pivoted : logi FALSE
.lm.fit
的结果没有受支持的通用函数,例如summary
,anova
,predict
,plot
等.但是线性模型的推论很容易,因此计算自己很简单(提供你知道背后的理论):
Result of .lm.fit
has no supported generic functions like summary
, anova
, predict
, plot
, etc. But inference of a linear model is easy, so it is straightforward to compute yourself (provided you know the theory behind):
-
回归系数的
- t统计表(来自
$qr
); - F统计和ANOVA表(来自
$effects
); - 残差标准误差,R平方和调整后的R平方(来自
$residulas
和$rank
).
- t-statistic table for regression coefficients (from
$qr
); - F-statistic and ANOVA table (from
$effects
); - residual standard error, R-squared and adjusted R-squared (from
$residulas
and$rank
).
最后,我提供一个基准:
Finally, I offer a benchmark:
library(microbenchmark)
microbenchmark(wmlm = {wmlm (xmat, ymat, wmat);},
lm = {for (i in 1:ncol(ymat))
lm(ymat[, i] ~ x1 + poly(x3, 3) + x2 * f, dat, weights = wmat[, i]);} )
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# wmlm 20.84512 23.02756 27.29539 24.49314 25.9027 79.8894 100 a
# lm 400.53000 405.10622 430.09787 414.42152 442.2640 535.9144 100 b
因此可以看到提高了17.25倍(基于中位数时间).
So 17.25 times boost is seen (based on median time).
这篇关于使用多个响应和权重运行lm的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!