遍历许多多元回归 [英] Looping through many multiple regressions

查看:70
本文介绍了遍历许多多元回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从这篇文章中运行以下代码:

除了"RDPI_T"和"DRY_T"变量是交替排列的事实之外,我的数据集几乎类似于上述文章中给出的数据集(在这种情况下,我认为这并不重要).我有类似的变量是输出数据集中的69个PRE变量,分别称为id1.PRE,id2.PRE ... id69.PRE,还有69个POST变量,分别称为id1.POST,id2.POST ... id69.POST.另外,growth_rate在称为输出的同一数据集中.

此外,我还想添加另外2个常规变量,这些变量不是常规变量,而是来自列表和国家/地区,但我不确定如何将其纳入此处?

任何帮助将不胜感激.谢谢!

解决方案

如果您的列名为id1.PRE,id2.PRE,则您上面的粘贴功能将不起作用,这很可能引发错误. >

请执行dput(head(output))并粘贴输出,这使我们可以查看列名以及为何行不通.

根据描述列名的方式尝试以下操作:

#simulate data
output=data.frame(
"growth_rate"=rnorm(100),
matrix(rnorm(100*69*2),nrow=100)
)
colnames(output)[-1] = c(paste("id",1:69,".PRE",sep=""),paste("id",1:69,".POST",sep=""))
output$year = 1901:2000
output$country = sample(letters,nrow(output),replace=TRUE)

n=69
#create list to hold models
models = vector("list",n)

for(i in 1:n) {
  vars = paste0("id",i,c(".PRE", ".POST"))
# i think it works without as.formula, but better to be safe
  FORMULA = as.formula(paste("growth_rate ~ ", paste(vars, collapse=" +  ")))
  models[[i]] = lm(FORMULA,data = output)
}

如果要包括其他变量:

for(i in 1:n) {
  vars = paste0("id",i,c(".PRE", ".POST"))
  # add other variables
  vars = c(vars,"country","year")
  FORMULA = paste("growth_rate ~ ", paste(vars, collapse=" +  "))
  models[[i]] = lm(FORMULA,data = output)
}

I am trying to run this code from this post: looping with iterations over two lists of variables for a multiple regression in R with modified variable and data frame names, because it seems to do exactly what I want and uses a very similar dataset. However, it keeps giving me an error and I don't know why, so I would really appreciate if someone could help me to understand the error or the corresponding line of code so I could try to figure out what's wrong.

for(i in 1:n) {
  vars = names(output)[names(output) %in% paste0(c(".PRE", ".POST"), i)]
  models[[as.character(i)]] = lm(paste("growth_rate ~ ", paste(vars, collapse=" +   ")),
                                 data = output)
}

Error in parse(text = x, keep.source = FALSE) : 
  <text>:2:0: unexpected end of input
1: growth_rate ~  
   ^

My dataset looks almost like the one given in the above mentioned post besides the fact that my "RDPI_T" and "DRY_T" variables are in an alternating order (which I dont think matters in this case). The analogous variables I have are 69 PRE variables called id1.PRE, id2.PRE ... id69.PRE and also 69 POST variables called id1.POST, id2.POST ... id69.POST in the output dataset. Also, growth_rate is in the same dataset called output.

Additionally, I would also like to add 2 more independent variables that are regular and do not come from a list: country and year but I am unsure how to incorporate that here?

Any help would be appreciated. Thank you!

解决方案

If your columns are called id1.PRE, id2.PRE, then the paste function you have above will not work, which most likely throws the error.

Please do dput(head(output)) and paste the output, this allows us to see the column names and why it doesn't work.

Try something below,according to how you describe the column names:

#simulate data
output=data.frame(
"growth_rate"=rnorm(100),
matrix(rnorm(100*69*2),nrow=100)
)
colnames(output)[-1] = c(paste("id",1:69,".PRE",sep=""),paste("id",1:69,".POST",sep=""))
output$year = 1901:2000
output$country = sample(letters,nrow(output),replace=TRUE)

n=69
#create list to hold models
models = vector("list",n)

for(i in 1:n) {
  vars = paste0("id",i,c(".PRE", ".POST"))
# i think it works without as.formula, but better to be safe
  FORMULA = as.formula(paste("growth_rate ~ ", paste(vars, collapse=" +  ")))
  models[[i]] = lm(FORMULA,data = output)
}

If you want to include other variables:

for(i in 1:n) {
  vars = paste0("id",i,c(".PRE", ".POST"))
  # add other variables
  vars = c(vars,"country","year")
  FORMULA = paste("growth_rate ~ ", paste(vars, collapse=" +  "))
  models[[i]] = lm(FORMULA,data = output)
}

这篇关于遍历许多多元回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆