用多个LHS拟合线性模型 [英] Fitting a linear model with multiple LHS
问题描述
我是R的新手,我想使用*apply
函数改进以下脚本(我已经读过有关apply
的文章,但是我无法使用它).我想在多个自变量(它们是数据框中的列)上使用lm
函数.我用
I am new to R and I want to improve the following script with an *apply
function (I have read about apply
, but I couldn't manage to use it). I want to use lm
function on multiple independent variables (which are columns in a data frame). I used
for (i in (1:3) {
assign(paste0('lm.',names(data[i])), lm(formula=formula(i),data=data))
}
Formula(i)
定义为
formula=function(x)
{
as.formula ( paste(names(data[x]),'~', paste0(names(data[-1:-3]), collapse = '+')), env=parent.frame() )
}
谢谢.
推荐答案
如果我没弄错的话,说明您正在使用这样的数据集:
If I don't get you wrong, you are working with a dataset like this:
set.seed(0)
dat <- data.frame(y1 = rnorm(30), y2 = rnorm(30), y3 = rnorm(30),
x1 = rnorm(30), x2 = rnorm(30), x3 = rnorm(30))
x1
,x2
和x3
是协变量,并且y1
,y2
,y3
是三个独立的响应.您正在尝试拟合三个线性模型:
x1
, x2
and x3
are covariates, and y1
, y2
, y3
are three independent response. You are trying to fit three linear models:
y1 ~ x1 + x2 + x3
y2 ~ x1 + x2 + x3
y3 ~ x1 + x2 + x3
当前,您正在使用遍历y1
,y2
,y3
的循环,每次拟合一个模型.您希望通过将for
循环替换为lapply
来加快该过程.
Currently you are using a loop through y1
, y2
, y3
, fitting one model per time. You hope to speed the process up by replacing the for
loop with lapply
.
您走错了路. lm()
是一项昂贵的操作.只要您的数据集不小,for
循环的成本就可以忽略不计.将for
循环替换为lapply
不会提高性能.
You are on the wrong track. lm()
is an expensive operation. As long as your dataset is not small, the costs of for
loop is negligible. Replacing for
loop with lapply
gives no performance gains.
由于所有三个模型都具有相同的RHS(~
的右侧),因此三个模型的模型矩阵相同.因此,所有模型的QR因式分解仅需执行一次. lm
允许这样做,您可以使用:
Since you have the same RHS (right hand side of ~
) for all three models, model matrix is the same for three models. Therefore, QR factorization for all models need only be done once. lm
allows this, and you can use:
fit <- lm(cbind(y1, y2, y3) ~ x1 + x2 + x3, data = dat)
#Coefficients:
# y1 y2 y3
#(Intercept) -0.081155 0.042049 0.007261
#x1 -0.037556 0.181407 -0.070109
#x2 -0.334067 0.223742 0.015100
#x3 0.057861 -0.075975 -0.099762
如果选中str(fit)
,您将看到这不是三个线性模型的列表;相反,它是具有单个$qr
对象但具有多个LHS的单个线性模型.因此$coefficients
,$residuals
和$fitted.values
是矩阵.除通常的"lm"类外,所得的线性模型还具有一个附加的"mlm"类.我创建了一个特殊的 mlm 标记的问题,主题,由其标签Wiki 概括.
If you check str(fit)
, you will see that this is not a list of three linear models; instead, it is a single linear model with a single $qr
object, but with multiple LHS. So $coefficients
, $residuals
and $fitted.values
are matrices. The resulting linear model has an additional "mlm" class besides the usual "lm" class. I created a special mlm tag collecting some questions on the theme, summarized by its tag wiki.
如果您有更多的协变量,则可以避免使用.
键入或粘贴公式:
If you have a lot more covariates, you can avoid typing or pasting formula by using .
:
fit <- lm(cbind(y1, y2, y3) ~ ., data = dat)
#Coefficients:
# y1 y2 y3
#(Intercept) -0.081155 0.042049 0.007261
#x1 -0.037556 0.181407 -0.070109
#x2 -0.334067 0.223742 0.015100
#x3 0.057861 -0.075975 -0.099762
警告:不要写
y1 + y2 + y3 ~ x1 + x2 + x3
这会将y = y1 + y2 + y3
视为单个响应.使用cbind()
.
This will treat y = y1 + y2 + y3
as a single response. Use cbind()
.
我对概论感兴趣.我有一个数据框
df
,其中前n
列是因变量(y1,y2,y3,....)
,下一个m
列是自变量(x1+x2+x3+....)
.对于n = 3
和m = 3
,它是fit <- lm(cbind(y1, y2, y3) ~ ., data = dat))
.但是如何使用df
的结构自动执行此操作.我的意思是类似(for i in (1:n)) fit <- lm(cbind(df[something] ~ df[something], data = dat))
的东西.我用paste
和paste0
创建了该东西".谢谢.
I am interested in a generalization. I have a data frame
df
, where firstn
columns are dependent variables(y1,y2,y3,....)
and nextm
columns are independent variables(x1+x2+x3+....)
. Forn = 3
andm = 3
it isfit <- lm(cbind(y1, y2, y3) ~ ., data = dat))
. But how to do this automatically, by using the structure of thedf
. I mean something like(for i in (1:n)) fit <- lm(cbind(df[something] ~ df[something], data = dat))
. That "something" I have created it withpaste
andpaste0
. Thank you.
因此,您正在编写公式,或者想要在循环中动态生成/构造模型公式.有很多方法可以做到这一点,并且很多关于堆栈溢出的问题都与此有关.通常有两种方法:
So you are programming your formula, or want to dynamically generate / construct model formulae in the loop. There are many ways to do this, and many Stack Overflow questions are about this. There are commonly two approaches:
- 使用
reformulate
; - 使用
paste
/paste0
和formula
/as.formula
.
- use
reformulate
; - use
paste
/paste0
andformula
/as.formula
.
我更喜欢reformulate
,因为它很简洁,但是它在公式中不支持多个LHS. 如果要转换LHS,还需要一些特殊处理.因此,在下文中,我将使用paste
解决方案.
I prefer to reformulate
for its neatness, however, it does not support multiple LHS in the formula. It also needs some special treatment if you want to transform the LHS. So In the following I would use paste
solution.
对于数据框df
,您可以这样做
For you data frame df
, you may do
paste0("cbind(", paste(names(df)[1:n], collapse = ", "), ")", " ~ .")
一种更美观的方法是使用sprintf
和toString
来构造LHS:
A more nice-looking way is to use sprintf
and toString
to construct the LHS:
sprintf("cbind(%s) ~ .", toString(names(df)[1:n]))
以下是使用iris
数据集的示例:
Here is an example using iris
dataset:
string_formula <- sprintf("cbind(%s) ~ .", toString(names(iris)[1:2]))
# "cbind(Sepal.Length, Sepal.Width) ~ ."
您可以将此字符串公式传递给lm
,因为lm
会自动将其强制转换为公式类.或者,您也可以使用formula
(或as.formula
)自己进行强制:
You can pass this string formula to lm
, as lm
will automatically coerce it into formula class. Or you may do the coercion yourself using formula
(or as.formula
):
formula(string_formula)
# cbind(Sepal.Length, Sepal.Width) ~ .
备注:
R核心的其他地方也支持此多个LHS公式:
This multiple LHS formula is also supported elsewhere in R core:
- 函数
aggregate
; 的公式方法
- 使用
aov
进行方差分析.
这篇关于用多个LHS拟合线性模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!