编写循环/函数以在同一数据帧上生成各种线性回归 [英] Writing loop/function to generate various linear regressions on same dataframe

查看:77
本文介绍了编写循环/函数以在同一数据帧上生成各种线性回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用R编写循环或函数,但我仍然不太了解如何做到这一点.当前,我需要编写一个循环/函数(不确定哪个会更好),以在同一数据帧内创建多个线性回归模型.

I am writing loops or functions in R, and I still haven't really understood how to do that. Currently, I need to write a loop/function (not sure which one would be better) to create several linear regression models within the same data frame.

我有这样的数据:

dataset <- read.table(text = 
"ID  A_2 B_2 C_2 A_1 B_1 C_1 AGE
M1  10  6   6   8   8   9   25
M2  50  69  54  67  22  44  16
M3  5   80  44  78  5   55  18
M4  60  70  52  89  3   56  28
M5  60  5   34  90  80  56  34
M6  55  55  67  60  100 77  54", header = TRUE, stringsAsFactors = FALSE)

我正在建立这样的模型:

I am building models like this:

model1 <- lm(A_2~A_1+age, data=dataset)

model2 <- lm(B_2~B_1+age, data=dataset)

model3 <- lm(C_2~C_1+age, data=dataset)

我需要编写一个循环:

  • 接受variable _2(因变量)和variable _1(因变量),并使用像age ...
  • 这样的协变量
  • 创建lm模型,并将输出(即T值,p值,置信区间等)存储在data.frame中,然后可以打印.
  • takes variable _2 (the dependent variable) and variable _1 (independent variable) and covariates like age ...
  • creates the lm models, and stores outputs (i.e, T-value, p-value, confidence intervals etc) in a data.frame that I can then print.
Dep_va  Ind_var Convarites  Pvalue  "upper.cI" "low.cI" 

A_2 A_1 age         
B_2 B_1 age         
C_2 C_1 age         
D_2 D_1 age         

推荐答案

这是base R解决lapply循环问题的方法.

Here is a base R approach to the problem with lapply loops.

首先,如果要自动提取以_2结尾的变量名(应该是所有因变量),则可以实现以下代码:

First if you want to automatically extract the variable names ending in _2 which should be all of your dependent variables you could implement the following code:

dep_vars<-grep("_2$",colnames(dataset),value = T) #This selects all variables ending in `_2` which should all be dependent variables.

reg_vars<-gsub("_2$","",dep_vars) #This removes the `_2` from the dependent variables which should give you the common stem which can be used to select both dependent and independent variables from your data frame.

然后,您可以在lapply循环中使用它来创建公式:

Then you can use this in your lapply loop for creating your formulas:

full_results <- lapply(reg_vars, function(i) summary(lm(paste0("log(",i,"_2)~",i,"_1+AGE"),data=dataset)))

现在,您将获得线性回归摘要的列表,您可以在其中提取所需的信息.我认为这是您想要的输出,但是请澄清一下:

Now you will have a list of linear regression summaries where you can extract the info you want. I think this is what you want for the output but please clarify if not:

print_results<-lapply(full_results,function(i) data.frame(
                                            Dep_va = names(attributes(i[["terms"]])$dataClasses)[1], 
                                            Ind_var = names(attributes(i[["terms"]])$dataClasses)[2],
                                            Covariates = names(attributes(i[["terms"]])$dataClasses)[3], 
                                            Pvalue = i[["coefficients"]][2,4],
                                            upper.cI = i[["coefficients"]][2,1]+1.96*i[["coefficients"]][2,2],
                                            low.cI = i[["coefficients"]][2,1]-1.96*i[["coefficients"]][2,2]))

该代码将为您提供数据帧列表,并且如果您要将其组合为一个data.frame:

That code will give you a list of data frames and if you want to combine it into one data.frame:

final_results<-do.call("rbind",print_results)

输出结果:

Dep_va Ind_var Covariates     Pvalue upper.cI     low.cI
1    A_2     A_1        AGE 0.25753805 1.113214 -0.1877324
2    B_2     B_1        AGE 0.68452053 1.211355 -1.9292236
3    C_2     C_1        AGE 0.04827506 1.497688  0.3661343

希望有帮助!让我知道您是否在寻找不同的输出结果.

Hope that helps! Let me know if you were looking for different output results.

这篇关于编写循环/函数以在同一数据帧上生成各种线性回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆