将lm()和预报()应用于数据帧中的多个列 [英] Applying lm() and predict() to multiple columns in a data frame

查看:99
本文介绍了将lm()和预报()应用于数据帧中的多个列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在下面有一个示例数据集.

I have an example dataset below.

train<-data.frame(x1 = c(4,5,6,4,3,5), x2 = c(4,2,4,0,5,4), x3 = c(1,1,1,0,0,1),
                  x4 = c(1,0,1,1,0,0), x5 = c(0,0,0,1,1,1))

假设我想基于列x1x2为列x3x4x5创建单独的模型.例如

Suppose I want to create separate models for column x3, x4, x5 based on column x1 and x2. For example

lm1 <- lm(x3 ~ x1 + x2)
lm2 <- lm(x4 ~ x1 + x2)
lm3 <- lm(x5 ~ x1 + x2) 

然后我想采用这些模型,并使用预测将其应用于测试集,然后创建一个矩阵,其中将每个模型的结果作为一列.

I want to then take these models and apply them to a testing set using predict, and then create a matrix that has each model outcome as a column.

test <- data.frame(x1 = c(4,3,2,1,5,6), x2 = c(4,2,1,6,8,5))
p1 <- predict(lm1, newdata = test)
p2 <- predict(lm2, newdata = test)
p3 <- predict(lm3, newdata = test)
final <- cbind(p1, p2, p3)

这是一个简化的版本,您可以逐步进行操作,实际数据太大.有没有办法创建一个函数或使用for语句将其组合为一个或两个步骤?

This is a simplified version where you can do it step by step, the actual data is far too large. Is there a way to create a function or use a for statement to combine this into one or two steps?

推荐答案

我倾向于将您的问题作为具有多个LHS的线性模型,但遗憾的是,该问题并未在那儿解决.另一方面,根据lm() 对'mlm'线性模型对象的预测是关于预测的,但是有点当您使用公式界面而不是矩阵界面时,情况就大不一样了.

I had an inclination to close your question as a duplicate to Fitting a linear model with multiple LHS, but sadly the prediction issue is not addressed over there. On the other hand, Prediction of 'mlm' linear model object from lm() talks about prediction, but is a little bit far off your situation, as you work with formula interface instead of matrix interface.

我没有设法在"mlm"标签中找到一个完美的重复目标.因此,我认为为该标签提供另一个答案是一个好主意.正如我在链接问题中所说的那样,predict.mlm不支持se.fit,目前,这也是"mlm"标记中缺少的问题.因此,我将借此机会填补这一空白.

I did not manage to locate a perfect duplicate target in "mlm" tag. So I think it a good idea to contribute another answer for this tag. As I said in linked questions, predict.mlm does not support se.fit, and at the moment, this is also a missing issue in "mlm" tag. So I would take this chance to fill such gap.

这里是获取预测标准误差的函数:

Here is a function to get standard error of prediction:

f <- function (mlmObject, newdata) {
  ## model formula
  form <- formula(mlmObject)
  ## drop response (LHS)
  form[[2]] <- NULL
  ## prediction matrix
  X <- model.matrix(form, newdata)
  Q <- forwardsolve(t(qr.R(mlmObject$qr)), t(X))
  ## unscaled prediction standard error
  unscaled.se <- sqrt(colSums(Q ^ 2))
  ## residual standard error
  sigma <- sqrt(colSums(residuals(mlmObject) ^ 2) / mlmObject$df.residual)
  ## scaled prediction standard error
  tcrossprod(unscaled.se, sigma)
  }

对于您给出的示例,您可以

For your given example, you can do

## fit an `mlm`
fit <- lm(cbind(x3, x4, x5) ~ x1 + x2, data = train)

## prediction (mean only)
pred <- predict(fit, newdata = test)

#            x3          x4         x5
#1  0.555956679  0.38628159 0.60649819
#2  0.003610108  0.47653430 0.95848375
#3 -0.458483755  0.48014440 1.27256318
#4 -0.379061372 -0.03610108 1.35920578
#5  1.288808664  0.12274368 0.17870036
#6  1.389891697  0.46570397 0.01624549

## prediction error
pred.se <- f(fit, newdata = test)

#          [,1]      [,2]      [,3]
#[1,] 0.1974039 0.3321300 0.2976205
#[2,] 0.3254108 0.5475000 0.4906129
#[3,] 0.5071956 0.8533510 0.7646849
#[4,] 0.6583707 1.1077014 0.9926075
#[5,] 0.5049637 0.8495959 0.7613200
#[6,] 0.3552794 0.5977537 0.5356451

我们可以验证f是正确的:

We can verify that f is correct:

## `lm1`, `lm2` and `lm3` are defined in your question
predict(lm1, test, se.fit = TRUE)$se.fit
#        1         2         3         4         5         6 
#0.1974039 0.3254108 0.5071956 0.6583707 0.5049637 0.3552794 

predict(lm2, test, se.fit = TRUE)$se.fit
#        1         2         3         4         5         6 
#0.3321300 0.5475000 0.8533510 1.1077014 0.8495959 0.5977537 

predict(lm3, test, se.fit = TRUE)$se.fit
#        1         2         3         4         5         6 
#0.2976205 0.4906129 0.7646849 0.9926075 0.7613200 0.5356451 

这篇关于将lm()和预报()应用于数据帧中的多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆