使用特定的因变量和自变量自动进行回归 [英] Automate regression with specific dependent and independent variables

查看:205
本文介绍了使用特定的因变量和自变量自动进行回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

MVE: 将此作为数据集:

MVE: Let this be the data set:

data <- data.frame(year = rep(seq(1966,2015,1), 8), 
               county = c(rep('prva', 50), rep('druga', 50), rep('treća', 50), rep('četvrta', 50),
                          rep('peta', 50), rep('šesta', 50), rep('sedma', 50), rep('osma', 50)),
               crime1 = runif(400), crime2 = runif(400), crime3 = runif(400), 
               uvar1 = runif(400), uvar2 = runif(400), uvar3 = runif(400),
               var1 = runif(400), var2 = runif(400), var3 = runif(400), var4 = runif(400), var5 = runif(400))

比方说,犯罪1,2和3是特定的因变量. uvar1,2和3是特定的自变量. var1,2等是其他协变量.我想做的是使回归自动化.

Let's say crime1,2 and 3 are specific dependent variables. uvar1,2 and 3 are specific independent variables. var1,2 etc. are other covariates. What I'm trying to do is automate the regressions.

就是说,我想得到这段代码的结果:

Namely, I want to get the result of this code:

plm(log(crime1) = log(univar1) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data)

plm(log(crime2) = log(univar2) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data)

等;但无需为每个估计的模型编写20行代码.

etc.; but without writing 20 lines of code for each estimated model.

通过研究类似的问题,这是我所知道的:

By looking at similar questions, this is as far as I'd come:

crime <- c('crime1', 'crime2', 'crime3')
plm.results <- lapply(data[, crime], function(y) plm(y ~ var1 + var2 + var3 + var4, 
                                                     model = 'within', effect ='twoways', data = data))

这当然有助于我的因变量,但是我无法弄清楚如何在每个这些估计中包括特定的自变量.为了再次澄清,我希望univar1出现在第一个回归中,而不是出现在其他回归中,等等.

Which certainly helps for my dependent variables, but I cannot figure how to include specific independent variables in each of these estimations. To clarify once more, I want univar1 to be in the first regression, but not in the rest of them etc.

推荐答案

formula函数在创建多组模型时很有用.您可以合并变体 使用paste0formulalapply的组合来遍历索引1到3.

formula function is helpful when creating multiple sets of models. You could incorporate variations using combination of paste0 and formula with lapply to traverse the indices 1 to 3.

#remember to set.seed when sampling from distributions

set.seed(123)

#a helper function to create "log(var)" from "var"
fn_appendLog = function(x) {
 paste0("log(",x,")")
}



modelList = lapply(1:3,function(x) {


indepVars2 = Reduce(function(x,y) paste(x,y,sep="+"),lapply(colnames(regDF)[grepl("^v",colnames(regDF))],fn_appendLog))

#> indepVars2
#[1] "log(var1)+log(var2)+log(var3)+log(var4)+log(var5)"


indepVars1 = fn_appendLog(paste0("uvar",x))

depVar = fn_appendLog(paste0("crime",x))

formulaVar = formula(paste0(depVar, " ~ ",indepVars1,"+", indepVars2))

#> formulaVar
#log(crime1) ~ log(uvar1) + log(var1) + log(var2) + log(var3) +  log(var4) + log(var5)


modelObj = plm(formulaVar, model = 'within', effect = 'twoways', data = regDF)


})

摘要:

summary(modelList[[1]])

#> summary(modelList[[1]])
#Twoways effects Within Model
#
#Call:
#plm(formula = formulaVar, data = regDF, effect = "twoways", model = "within")
#
#Balanced Panel: n=50, T=8, N=400
#
#Residuals :
#   Min. 1st Qu.  Median 3rd Qu.    Max. 
# -5.730  -0.396   0.116   0.599   1.520 
#
#Coefficients :
#             Estimate Std. Error t-value Pr(>|t|)
#log(uvar1)  0.0393871  0.0490891  0.8024   0.4229
#log(var1)  -0.0369356  0.0541029 -0.6827   0.4953
#log(var2)  -0.0455269  0.0543664 -0.8374   0.4030
#log(var3)   0.0150516  0.0520347  0.2893   0.7726
#log(var4)  -0.0034534  0.0441506 -0.0782   0.9377
#log(var5)  -0.0109038  0.0527446 -0.2067   0.8363
#
#Total Sum of Squares:    302.23
#Residual Sum of Squares: 300.6
#R-Squared:      0.0053896
#Adj. R-Squared: 0.0045407
#F-statistic: 0.304357 on 6 and 337 DF, p-value: 0.93448

说明:

自变量有两种类型,第一种是uvar1,其他的是var1...varN.

The independent variables are of two type, first uvar1 and others var1...varN.

1)colnames(regDF)[grepl("^v",colnames(regDF))]这将为我们提供所有变量的列表 在regDF中,它以字母'v'开头的模式与插入符号相匹配,表示插入的开始 字符串,并以$作为字符串的结尾,在此阶段输出的是c("var1","var2"...,"var5")

1) colnames(regDF)[grepl("^v",colnames(regDF))] this will give us a list of all variables in regDF which match pattern of beginning with letter 'v' with caret symbol signifying start of the string and $ as end of the string, output at this stage is c("var1","var2"...,"var5")

2)我们需要此变量向量的log变体,因此我们将它们通过lapply传递给函数 fn_appendLog,这将导致list("log(var1)","log(var2)",...,"log(var5)")

2) We need log variants of this variable vector hence we pass them through lapply to the function fn_appendLog, which results in the list output of list("log(var1)","log(var2)",...,"log(var5)")

3)接下来,我们需要将这些变量转换为log(var1)+log(var2)...+log(var5)

3) Next, we need these variables transformed as log(var1)+log(var2)...+log(var5)

4)为此,我们将功能Reduce与功能paste(x,y,sep="+")结合使用,这需要 上面列表中的每个元素都与相邻元素并与分隔符一起连接为"+"

4) To do so, we use function Reduce with the function paste(x,y,sep="+"), this takes each element of the above list with adjacent element and joins together with the seperator as "+"

   step1 = (log(var1)+log(var2))
   step2 = (log(var1)+log(var2)) + log(var3)
   step3 = (log(var1)+log(var2)+log(var3))+ log(var4) and so on

5)函数Reduce将函数应用于列表并将输出汇总到单个向量中 产生log(var1)+log(var2)+log(var3)+log(var4)+log(var5)

5) The function Reduce applies the function to the list and aggregates the output into a single vector resulting the final output of log(var1)+log(var2)+log(var3)+log(var4)+log(var5)

乍一看这似乎令人生畏,但是当您经常使用它们并探索它们的示例时, 很快就会成为您的全部曲目.了解功能的最佳方法是说lapply是从头到尾阅读?lapply的文档并执行 列出的示例,修改参数并获得熟悉.希望这能有所启发 根据您的查询.

This might seem intimidating at first but as you use them often and explore examples they will part of you repertoire in no time.The best way to learn about a function say lapply is to read the documentation of ?lapply end to end and execute listed examples, tinker with parameters and gain familiarity. Hope this sheds some light on your query.

这篇关于使用特定的因变量和自变量自动进行回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆