R-如何使用Apply函数而不是进行迭代? [英] R - How can I use the apply functions instead of iterating?

查看:106
本文介绍了R-如何使用Apply函数而不是进行迭代?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试一次针对一个独立变量 一个对多个因变量执行线性回归.

I am trying to perform linear regressions for multiple dependent variables against a independent variable one at a time.

当缺少观测值( NA )时,整个行都不会用于该特定回归.

When there is a missing observation (NA) , the entire row is not used for that particular regression.

我通过循环/迭代因变量的每一列来做到这一点.

I have done it by looping/iterating through each column of dependent variable.

fit = list()
for( i in 1 : 2 ) {
    fit[[i]] = lm( mydf$Ind_Var[ which( !is.na( mydf[  , (2+i) ] ) ) ] ~ na.omit( mydf[ , (2+i) ] ) )
    }

无需涉及其他程序包(让我们限于 lm 应用族函数 do/do.call 之类的函数),如何我可以这样做吗?

Without having to involve other packages ( let's restrict to functions like lm, apply family functions , do/do.call), how can I do so?

mydf = data.frame( 
"ID"    = rep( "A" , 25 ),
"Date"  = c( 1 : 25 ), 
"Dep_1" = c( 0.78670185, 0.15221561, NA, 0.85270392, 0.90057399, 0.75974473, 0.42026760, 0.64035871, 0.83012434, 0.04985492, 0.06619375, 0.36024745, 0.83969627, 0.45293842, 0.25272036, NA, 0.63783321, 0.42294695, 0.06726004, 0.14124547, 0.54590193, 0.99560087, 0.14255501, 0.41559977, 0.80120970) ,          
"Dep_2" = c( 0.736137983, 0.979317444, 0.901380500, 0.942325049, 0.420741297, NA, 0.243408607, 0.824064331, 0.462912557, NA, 0.710834065, 0.264922818, 0.797917063, 0.578866651, 0.955944058, 0.291149075, 0.437322581, 0.298153168, 0.579299049, 0.671718144, 0.545720702, 0.099175216, 0.808933227, 0.912825535, 0.417438973 ) ,          
"Ind_Var" = c( 75:51 )  )


我自己尝试进行的转换将是:


My own attempt of converting will be:

apply( mydf[ ,-c(1:2) ] , 2 , function( x ) lm( mydf$Ind_Var[ which( !is.na( x ) ) ] ~ na.omit(x)  ) )

但这涉及对 mydf 进行硬编码.

but this involves having mydf hardcoded.

如果使用了不正确的用语,我深表歉意.

I apologize if I have used any incorrect terms.

推荐答案

以下方法怎么样

# Specify the columns that contain your predictor variables
predIdx <- c(3, 4);

# lm(y ~ x), for x being a single predictor
lapply(predIdx, function(x) lm(mydf[, ncol(mydf)] ~ mydf[, x]))

在这里,我假定响应始终在数据帧的最后一列中.您需要手动指定的只是包含预测变量的列索引.

Here I assume that the response is always in the last column of the dataframe. All you need to specify manually are the column indices that contain your predictors.

如果要手动排除NA,则可以在lapply函数内使用complete.cases;否则,可以使用complete.cases.这不是必需的,因为lm(默认情况下)处理NA.

If you want to manually exclude the NAs you could use complete.cases inside the lapply function; this shouldn't be necessary because lm (by default) deals with NA's.

我不确定对mydf进行硬编码"是什么意思.您可以将上述代码包装在函数内,以使其对于任何数据框df都更通用,在predIdx列中提供了预测变量,在respIndx列中提供了独立变量.

I'm not sure what you mean by "having mydf hardcoded". You can wrap above code inside a function to make it more general, for any dataframe df, with predictors given in columns predIdx and the independent variable given in column respIndx.

one_at_a_time_LM <- function(df, predIdx, respIdx) {
    lapply(predIdx, function(x) lm(df[, respIdx] ~ df[, x]))
}

one_at_a_time_LM(mydf, c(3, 4), 5);

这篇关于R-如何使用Apply函数而不是进行迭代?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆