使用特定的因变量和自变量自动进行回归 [英] Automate regression with specific dependent and independent variables
问题描述
MVE: 将此作为数据集:
MVE: Let this be the data set:
data <- data.frame(year = rep(seq(1966,2015,1), 8),
county = c(rep('prva', 50), rep('druga', 50), rep('treća', 50), rep('četvrta', 50),
rep('peta', 50), rep('šesta', 50), rep('sedma', 50), rep('osma', 50)),
crime1 = runif(400), crime2 = runif(400), crime3 = runif(400),
uvar1 = runif(400), uvar2 = runif(400), uvar3 = runif(400),
var1 = runif(400), var2 = runif(400), var3 = runif(400), var4 = runif(400), var5 = runif(400))
比方说,犯罪1,2和3是特定的因变量. uvar1,2和3是特定的自变量. var1,2等是其他协变量.我想做的是使回归自动化.
Let's say crime1,2 and 3 are specific dependent variables. uvar1,2 and 3 are specific independent variables. var1,2 etc. are other covariates. What I'm trying to do is automate the regressions.
就是说,我想得到这段代码的结果:
Namely, I want to get the result of this code:
plm(log(crime1) = log(univar1) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data)
plm(log(crime2) = log(univar2) + log(var1) + log(var2) + log(var3) + log(var4), model = 'within', effect = 'twoways', data = data)
等;但无需为每个估计的模型编写20行代码.
etc.; but without writing 20 lines of code for each estimated model.
通过研究类似的问题,这是我所知道的:
By looking at similar questions, this is as far as I'd come:
crime <- c('crime1', 'crime2', 'crime3')
plm.results <- lapply(data[, crime], function(y) plm(y ~ var1 + var2 + var3 + var4,
model = 'within', effect ='twoways', data = data))
这当然有助于我的因变量,但是我无法弄清楚如何在每个这些估计中包括特定的自变量.为了再次澄清,我希望univar1出现在第一个回归中,而不是出现在其他回归中,等等.
Which certainly helps for my dependent variables, but I cannot figure how to include specific independent variables in each of these estimations. To clarify once more, I want univar1 to be in the first regression, but not in the rest of them etc.
推荐答案
formula
函数在创建多组模型时很有用.您可以合并变体
使用paste0
和formula
与lapply
的组合来遍历索引1到3.
formula
function is helpful when creating multiple sets of models. You could incorporate variations
using combination of paste0
and formula
with lapply
to traverse the indices 1 to 3.
#remember to set.seed when sampling from distributions
set.seed(123)
#a helper function to create "log(var)" from "var"
fn_appendLog = function(x) {
paste0("log(",x,")")
}
modelList = lapply(1:3,function(x) {
indepVars2 = Reduce(function(x,y) paste(x,y,sep="+"),lapply(colnames(regDF)[grepl("^v",colnames(regDF))],fn_appendLog))
#> indepVars2
#[1] "log(var1)+log(var2)+log(var3)+log(var4)+log(var5)"
indepVars1 = fn_appendLog(paste0("uvar",x))
depVar = fn_appendLog(paste0("crime",x))
formulaVar = formula(paste0(depVar, " ~ ",indepVars1,"+", indepVars2))
#> formulaVar
#log(crime1) ~ log(uvar1) + log(var1) + log(var2) + log(var3) + log(var4) + log(var5)
modelObj = plm(formulaVar, model = 'within', effect = 'twoways', data = regDF)
})
摘要:
summary(modelList[[1]])
#> summary(modelList[[1]])
#Twoways effects Within Model
#
#Call:
#plm(formula = formulaVar, data = regDF, effect = "twoways", model = "within")
#
#Balanced Panel: n=50, T=8, N=400
#
#Residuals :
# Min. 1st Qu. Median 3rd Qu. Max.
# -5.730 -0.396 0.116 0.599 1.520
#
#Coefficients :
# Estimate Std. Error t-value Pr(>|t|)
#log(uvar1) 0.0393871 0.0490891 0.8024 0.4229
#log(var1) -0.0369356 0.0541029 -0.6827 0.4953
#log(var2) -0.0455269 0.0543664 -0.8374 0.4030
#log(var3) 0.0150516 0.0520347 0.2893 0.7726
#log(var4) -0.0034534 0.0441506 -0.0782 0.9377
#log(var5) -0.0109038 0.0527446 -0.2067 0.8363
#
#Total Sum of Squares: 302.23
#Residual Sum of Squares: 300.6
#R-Squared: 0.0053896
#Adj. R-Squared: 0.0045407
#F-statistic: 0.304357 on 6 and 337 DF, p-value: 0.93448
说明:
自变量有两种类型,第一种是uvar1
,其他的是var1...varN
.
The independent variables are of two type, first uvar1
and others var1...varN
.
1)colnames(regDF)[grepl("^v",colnames(regDF))]
这将为我们提供所有变量的列表
在regDF中,它以字母'v'开头的模式与插入符号相匹配,表示插入的开始
字符串,并以$
作为字符串的结尾,在此阶段输出的是c("var1","var2"...,"var5")
1) colnames(regDF)[grepl("^v",colnames(regDF))]
this will give us a list of all variables
in regDF which match pattern of beginning with letter 'v' with caret symbol signifying start of
the string and $
as end of the string, output at this stage is c("var1","var2"...,"var5")
2)我们需要此变量向量的log变体,因此我们将它们通过lapply
传递给函数
fn_appendLog
,这将导致list("log(var1)","log(var2)",...,"log(var5)")
2) We need log variants of this variable vector hence we pass them through lapply
to the function
fn_appendLog
, which results in the list output of list("log(var1)","log(var2)",...,"log(var5)")
3)接下来,我们需要将这些变量转换为log(var1)+log(var2)...+log(var5)
3) Next, we need these variables transformed as log(var1)+log(var2)...+log(var5)
4)为此,我们将功能Reduce
与功能paste(x,y,sep="+")
结合使用,这需要
上面列表中的每个元素都与相邻元素并与分隔符一起连接为"+"
4) To do so, we use function Reduce
with the function paste(x,y,sep="+")
, this takes
each element of the above list with adjacent element and joins together with the seperator as "+"
step1 = (log(var1)+log(var2))
step2 = (log(var1)+log(var2)) + log(var3)
step3 = (log(var1)+log(var2)+log(var3))+ log(var4) and so on
5)函数Reduce
将函数应用于列表并将输出汇总到单个向量中
产生log(var1)+log(var2)+log(var3)+log(var4)+log(var5)
5) The function Reduce
applies the function to the list and aggregates the output into a single vector
resulting the final output of log(var1)+log(var2)+log(var3)+log(var4)+log(var5)
乍一看这似乎令人生畏,但是当您经常使用它们并探索它们的示例时,
很快就会成为您的全部曲目.了解功能的最佳方法是说lapply
是从头到尾阅读?lapply
的文档并执行
列出的示例,修改参数并获得熟悉.希望这能有所启发
根据您的查询.
This might seem intimidating at first but as you use them often and explore examples they
will part of you repertoire in no time.The best way to learn about a function say lapply
is to read the documentation of ?lapply
end to end and execute
listed examples, tinker with parameters and gain familiarity. Hope this sheds some light
on your query.
这篇关于使用特定的因变量和自变量自动进行回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!