R-如何使用Apply函数而不是进行迭代? [英] R - How can I use the apply functions instead of iterating?
问题描述
我正尝试一次针对一个独立变量 一个对多个因变量执行线性回归.
I am trying to perform linear regressions for multiple dependent variables against a independent variable one at a time.
当缺少观测值( NA )时,整个行都不会用于该特定回归.
When there is a missing observation (NA) , the entire row is not used for that particular regression.
我通过循环/迭代因变量的每一列来做到这一点.
I have done it by looping/iterating through each column of dependent variable.
fit = list()
for( i in 1 : 2 ) {
fit[[i]] = lm( mydf$Ind_Var[ which( !is.na( mydf[ , (2+i) ] ) ) ] ~ na.omit( mydf[ , (2+i) ] ) )
}
无需涉及其他程序包(让我们限于 lm ,应用族函数, do/do.call 之类的函数),如何我可以这样做吗?
Without having to involve other packages ( let's restrict to functions like lm, apply family functions , do/do.call), how can I do so?
mydf = data.frame(
"ID" = rep( "A" , 25 ),
"Date" = c( 1 : 25 ),
"Dep_1" = c( 0.78670185, 0.15221561, NA, 0.85270392, 0.90057399, 0.75974473, 0.42026760, 0.64035871, 0.83012434, 0.04985492, 0.06619375, 0.36024745, 0.83969627, 0.45293842, 0.25272036, NA, 0.63783321, 0.42294695, 0.06726004, 0.14124547, 0.54590193, 0.99560087, 0.14255501, 0.41559977, 0.80120970) ,
"Dep_2" = c( 0.736137983, 0.979317444, 0.901380500, 0.942325049, 0.420741297, NA, 0.243408607, 0.824064331, 0.462912557, NA, 0.710834065, 0.264922818, 0.797917063, 0.578866651, 0.955944058, 0.291149075, 0.437322581, 0.298153168, 0.579299049, 0.671718144, 0.545720702, 0.099175216, 0.808933227, 0.912825535, 0.417438973 ) ,
"Ind_Var" = c( 75:51 ) )
我自己尝试进行的转换将是:
My own attempt of converting will be:
apply( mydf[ ,-c(1:2) ] , 2 , function( x ) lm( mydf$Ind_Var[ which( !is.na( x ) ) ] ~ na.omit(x) ) )
但这涉及对 mydf 进行硬编码.
but this involves having mydf hardcoded.
如果使用了不正确的用语,我深表歉意.
I apologize if I have used any incorrect terms.
推荐答案
以下方法怎么样
# Specify the columns that contain your predictor variables
predIdx <- c(3, 4);
# lm(y ~ x), for x being a single predictor
lapply(predIdx, function(x) lm(mydf[, ncol(mydf)] ~ mydf[, x]))
在这里,我假定响应始终在数据帧的最后一列中.您需要手动指定的只是包含预测变量的列索引.
Here I assume that the response is always in the last column of the dataframe. All you need to specify manually are the column indices that contain your predictors.
如果要手动排除NA,则可以在lapply
函数内使用complete.cases
;否则,可以使用complete.cases
.这不是必需的,因为lm
(默认情况下)处理NA.
If you want to manually exclude the NAs you could use complete.cases
inside the lapply
function; this shouldn't be necessary because lm
(by default) deals with NA's.
我不确定对mydf进行硬编码"是什么意思.您可以将上述代码包装在函数内,以使其对于任何数据框df
都更通用,在predIdx
列中提供了预测变量,在respIndx
列中提供了独立变量.
I'm not sure what you mean by "having mydf hardcoded". You can wrap above code inside a function to make it more general, for any dataframe df
, with predictors given in columns predIdx
and the independent variable given in column respIndx
.
one_at_a_time_LM <- function(df, predIdx, respIdx) {
lapply(predIdx, function(x) lm(df[, respIdx] ~ df[, x]))
}
one_at_a_time_LM(mydf, c(3, 4), 5);
这篇关于R-如何使用Apply函数而不是进行迭代?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!