用多个LHS拟合线性模型 [英] Fitting a linear model with multiple LHS

查看:153
本文介绍了用多个LHS拟合线性模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R的新手,我想使用*apply函数改进以下脚本(我已经读过有关apply的文章,但是我无法使用它).我想在多个自变量(它们是数据框中的列)上使用lm函数.我用

I am new to R and I want to improve the following script with an *apply function (I have read about apply, but I couldn't manage to use it). I want to use lm function on multiple independent variables (which are columns in a data frame). I used

for (i in (1:3) {
  assign(paste0('lm.',names(data[i])), lm(formula=formula(i),data=data))
  } 

Formula(i)定义为

formula=function(x)
{
  as.formula ( paste(names(data[x]),'~', paste0(names(data[-1:-3]), collapse = '+')), env=parent.frame() )
}

谢谢.

推荐答案

如果我没弄错的话,说明您正在使用这样的数据集:

If I don't get you wrong, you are working with a dataset like this:

set.seed(0)
dat <- data.frame(y1 = rnorm(30), y2 = rnorm(30), y3 = rnorm(30),
                  x1 = rnorm(30), x2 = rnorm(30), x3 = rnorm(30))

x1x2x3是协变量,并且y1y2y3是三个独立的响应.您正在尝试拟合三个线性模型:

x1, x2 and x3 are covariates, and y1, y2, y3 are three independent response. You are trying to fit three linear models:

y1 ~ x1 + x2 + x3
y2 ~ x1 + x2 + x3
y3 ~ x1 + x2 + x3

当前,您正在使用遍历y1y2y3的循环,每次拟合一个模型.您希望通过将for循环替换为lapply来加快该过程.

Currently you are using a loop through y1, y2, y3, fitting one model per time. You hope to speed the process up by replacing the for loop with lapply.

您走错了路. lm()是一项昂贵的操作.只要您的数据集不小,for循环的成本就可以忽略不计.将for循环替换为lapply不会提高性能.

You are on the wrong track. lm() is an expensive operation. As long as your dataset is not small, the costs of for loop is negligible. Replacing for loop with lapply gives no performance gains.

由于所有三个模型都具有相同的RHS(~的右侧),因此三个模型的模型矩阵相同.因此,所有模型的QR因式分解仅需执行一次. lm允许这样做,您可以使用:

Since you have the same RHS (right hand side of ~) for all three models, model matrix is the same for three models. Therefore, QR factorization for all models need only be done once. lm allows this, and you can use:

fit <- lm(cbind(y1, y2, y3) ~ x1 + x2 + x3, data = dat)
#Coefficients:
#             y1         y2         y3       
#(Intercept)  -0.081155   0.042049   0.007261
#x1           -0.037556   0.181407  -0.070109
#x2           -0.334067   0.223742   0.015100
#x3            0.057861  -0.075975  -0.099762

如果选中str(fit),您将看到这不是三个线性模型的列表;相反,它是具有单个$qr对象但具有多个LHS的单个线性模型.因此$coefficients$residuals$fitted.values是矩阵.除通常的"lm"类外,所得的线性模型还具有一个附加的"mlm"类.我创建了一个特殊的标记的问题,主题,由其标签Wiki 概括.

If you check str(fit), you will see that this is not a list of three linear models; instead, it is a single linear model with a single $qr object, but with multiple LHS. So $coefficients, $residuals and $fitted.values are matrices. The resulting linear model has an additional "mlm" class besides the usual "lm" class. I created a special mlm tag collecting some questions on the theme, summarized by its tag wiki.

如果您有更多的协变量,则可以避免使用.键入或粘贴公式:

If you have a lot more covariates, you can avoid typing or pasting formula by using .:

fit <- lm(cbind(y1, y2, y3) ~ ., data = dat)
#Coefficients:
#             y1         y2         y3       
#(Intercept)  -0.081155   0.042049   0.007261
#x1           -0.037556   0.181407  -0.070109
#x2           -0.334067   0.223742   0.015100
#x3            0.057861  -0.075975  -0.099762

警告:不要写

y1 + y2 + y3 ~ x1 + x2 + x3

这会将y = y1 + y2 + y3视为单个响应.使用cbind().

This will treat y = y1 + y2 + y3 as a single response. Use cbind().

我对概论感兴趣.我有一个数据框df,其中前n列是因变量(y1,y2,y3,....),下一个m列是自变量(x1+x2+x3+....).对于n = 3m = 3,它是fit <- lm(cbind(y1, y2, y3) ~ ., data = dat)).但是如何使用df的结构自动执行此操作.我的意思是类似(for i in (1:n)) fit <- lm(cbind(df[something] ~ df[something], data = dat))的东西.我用pastepaste0创建了该东西".谢谢.

I am interested in a generalization. I have a data frame df, where first n columns are dependent variables (y1,y2,y3,....) and next m columns are independent variables (x1+x2+x3+....). For n = 3 and m = 3 it is fit <- lm(cbind(y1, y2, y3) ~ ., data = dat)). But how to do this automatically, by using the structure of the df. I mean something like (for i in (1:n)) fit <- lm(cbind(df[something] ~ df[something], data = dat)). That "something" I have created it with paste and paste0. Thank you.

因此,您正在编写公式,或者想要在循环中动态生成/构造模型公式.有很多方法可以做到这一点,并且很多关于堆栈溢出的问题都与此有关.通常有两种方法:

So you are programming your formula, or want to dynamically generate / construct model formulae in the loop. There are many ways to do this, and many Stack Overflow questions are about this. There are commonly two approaches:

  1. 使用reformulate ;
  2. 使用paste/paste0formula/as.formula.
  1. use reformulate;
  2. use paste / paste0 and formula / as.formula.

我更喜欢reformulate,因为它很简洁,但是它在公式中不支持多个LHS. 如果要转换LHS,还需要一些特殊处理.因此,在下文中,我将使用paste解决方案.

I prefer to reformulate for its neatness, however, it does not support multiple LHS in the formula. It also needs some special treatment if you want to transform the LHS. So In the following I would use paste solution.

对于数据框df,您可以这样做

For you data frame df, you may do

paste0("cbind(", paste(names(df)[1:n], collapse = ", "), ")", " ~ .")

一种更美观的方法是使用sprintftoString来构造LHS:

A more nice-looking way is to use sprintf and toString to construct the LHS:

sprintf("cbind(%s) ~ .", toString(names(df)[1:n]))

以下是使用iris数据集的示例:

Here is an example using iris dataset:

string_formula <- sprintf("cbind(%s) ~ .", toString(names(iris)[1:2]))
# "cbind(Sepal.Length, Sepal.Width) ~ ."

您可以将此字符串公式传递给lm,因为lm会自动将其强制转换为公式类.或者,您也可以使用formula(或as.formula)自己进行强制:

You can pass this string formula to lm, as lm will automatically coerce it into formula class. Or you may do the coercion yourself using formula (or as.formula):

formula(string_formula)
# cbind(Sepal.Length, Sepal.Width) ~ .

备注:

R核心的其他地方也支持此多个LHS公式:

This multiple LHS formula is also supported elsewhere in R core:

这篇关于用多个LHS拟合线性模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆