拟合具有多个 LHS 的线性模型 [英] Fitting a linear model with multiple LHS

查看:42
本文介绍了拟合具有多个 LHS 的线性模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 R 的新手,我想使用 *apply 函数改进以下脚本(我已经阅读了关于 apply,但我无法使用它).我想对多个自变量(数据框中的列)使用 lm 函数.我用过

I am new to R and I want to improve the following script with an *apply function (I have read about apply, but I couldn't manage to use it). I want to use lm function on multiple independent variables (which are columns in a data frame). I used

for (i in (1:3) {
  assign(paste0('lm.',names(data[i])), lm(formula=formula(i),data=data))
  } 

Formula(i) 定义为

formula=function(x)
{
  as.formula ( paste(names(data[x]),'~', paste0(names(data[-1:-3]), collapse = '+')), env=parent.frame() )
}

谢谢.

推荐答案

如果我没猜错的话,您正在使用这样的数据集:

If I don't get you wrong, you are working with a dataset like this:

set.seed(0)
dat <- data.frame(y1 = rnorm(30), y2 = rnorm(30), y3 = rnorm(30),
                  x1 = rnorm(30), x2 = rnorm(30), x3 = rnorm(30))

x1x2x3 是协变量,y1y2, y3 是三个独立的响应.您正在尝试拟合三个线性模型:

x1, x2 and x3 are covariates, and y1, y2, y3 are three independent response. You are trying to fit three linear models:

y1 ~ x1 + x2 + x3
y2 ~ x1 + x2 + x3
y3 ~ x1 + x2 + x3

目前您正在使用通过 y1y2y3 的循环,每次拟合一个模型.您希望通过将 for 循环替换为 lapply 来加快进程.

Currently you are using a loop through y1, y2, y3, fitting one model per time. You hope to speed the process up by replacing the for loop with lapply.

你在错误的轨道上. lm() 是一个昂贵的操作.只要你的数据集不小,for 循环的开销就可以忽略不计.用 lapply 替换 for 循环不会带来性能提升.

You are on the wrong track. lm() is an expensive operation. As long as your dataset is not small, the costs of for loop is negligible. Replacing for loop with lapply gives no performance gains.

由于所有三个模型都具有相同的 RHS(~ 的右侧),因此三个模型的模型矩阵相同.因此,所有模型的 QR 分解只需要进行一次.lm 允许这样做,您可以使用:

Since you have the same RHS (right hand side of ~) for all three models, model matrix is the same for three models. Therefore, QR factorization for all models need only be done once. lm allows this, and you can use:

fit <- lm(cbind(y1, y2, y3) ~ x1 + x2 + x3, data = dat)
#Coefficients:
#             y1         y2         y3       
#(Intercept)  -0.081155   0.042049   0.007261
#x1           -0.037556   0.181407  -0.070109
#x2           -0.334067   0.223742   0.015100
#x3            0.057861  -0.075975  -0.099762

如果你检查str(fit),你会发现这不是三个线性模型的列表;相反,它是具有单个 $qr 对象的单个线性模型,但具有多个 LHS.所以 $coefficients$residuals$fitted.values 是矩阵.所得线性模型具有额外的mlm"值.除了通常的lm"之外的类班级.我创建了一个特殊的 标签,收集了一些关于主题,由其标签维基总结.

If you check str(fit), you will see that this is not a list of three linear models; instead, it is a single linear model with a single $qr object, but with multiple LHS. So $coefficients, $residuals and $fitted.values are matrices. The resulting linear model has an additional "mlm" class besides the usual "lm" class. I created a special mlm tag collecting some questions on the theme, summarized by its tag wiki.

如果你有更多的协变量,你可以避免使用 输入或粘贴公式.:

If you have a lot more covariates, you can avoid typing or pasting formula by using .:

fit <- lm(cbind(y1, y2, y3) ~ ., data = dat)
#Coefficients:
#             y1         y2         y3       
#(Intercept)  -0.081155   0.042049   0.007261
#x1           -0.037556   0.181407  -0.070109
#x2           -0.334067   0.223742   0.015100
#x3            0.057861  -0.075975  -0.099762

注意:不要写

y1 + y2 + y3 ~ x1 + x2 + x3

这会将 y = y1 + y2 + y3 视为单个响应.使用 cbind().

This will treat y = y1 + y2 + y3 as a single response. Use cbind().

我对概括感兴趣.我有一个数据框 df,其中第一个 n 列是因变量 (y1,y2,y3,....) 和下一个 m 列是自变量 (x1+x2+x3+....).对于 n = 3m = 3 它是 fit <- lm(cbind(y1, y2, y3) ~ ., data = dat)).但是如何通过使用 df 的结构自动执行此操作.我的意思是类似于 (for i in (1:n)) fit <- lm(cbind(df[something] ~ df[something], data = dat)).那个东西"我用 pastepaste0 创建了它.谢谢.

I am interested in a generalization. I have a data frame df, where first n columns are dependent variables (y1,y2,y3,....) and next m columns are independent variables (x1+x2+x3+....). For n = 3 and m = 3 it is fit <- lm(cbind(y1, y2, y3) ~ ., data = dat)). But how to do this automatically, by using the structure of the df. I mean something like (for i in (1:n)) fit <- lm(cbind(df[something] ~ df[something], data = dat)). That "something" I have created it with paste and paste0. Thank you.

所以您正在编写您的公式,或者想要在循环中动态生成/构建模型公式.有很多方法可以做到这一点,许多 Stack Overflow 问题都与此有关.通常有两种方法:

So you are programming your formula, or want to dynamically generate / construct model formulae in the loop. There are many ways to do this, and many Stack Overflow questions are about this. There are commonly two approaches:

  1. 使用reformulate;
  2. 使用paste/paste0formula/as.formula.

我更喜欢reformulate,因为它的整洁,但是,它不支持公式中的多个LHS.如果你想改造 LHS 也需要一些特殊的处理.所以在下面我将使用 paste 解决方案.

I prefer to reformulate for its neatness, however, it does not support multiple LHS in the formula. It also needs some special treatment if you want to transform the LHS. So In the following I would use paste solution.

对于你的数据框df,你可以做

For you data frame df, you may do

paste0("cbind(", paste(names(df)[1:n], collapse = ", "), ")", " ~ .")

更漂亮的方法是使用sprintftoString来构建LHS:

A more nice-looking way is to use sprintf and toString to construct the LHS:

sprintf("cbind(%s) ~ .", toString(names(df)[1:n]))

这是一个使用 iris 数据集的例子:

Here is an example using iris dataset:

string_formula <- sprintf("cbind(%s) ~ .", toString(names(iris)[1:2]))
# "cbind(Sepal.Length, Sepal.Width) ~ ."

您可以将此字符串公式传递给lm,因为lm 会自动将其强制转换为公式类.或者您可以使用 formula(或 as.formula)自己进行强制转换:

You can pass this string formula to lm, as lm will automatically coerce it into formula class. Or you may do the coercion yourself using formula (or as.formula):

formula(string_formula)
# cbind(Sepal.Length, Sepal.Width) ~ .

备注:

R 核心的其他地方也支持这种多 LHS 公式:

This multiple LHS formula is also supported elsewhere in R core:

这篇关于拟合具有多个 LHS 的线性模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆