R Loop for Variable Names 运行线性回归模型 [英] R Loop for Variable Names to run linear regression model

查看:23
本文介绍了R Loop for Variable Names 运行线性回归模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我对此很陌生,所以我的方法/想法可能是错误的,我已使用 R 和 R Studio 将 xlsx 数据集导入到数据框中.我希望能够遍历列名以获取其中包含10"的所有变量,以便运行简单的线性回归.所以这是我的代码:

First off, I am pretty new to this so my method/thinking may be wrong, I have imported a xlsx data set into a data frame using R and R studio. I want to be able to loop through the column names to get all of the variables with exactly "10" in them in order to run a simple linear regression. So here's my code:

indx <- grepl('_10_', colnames(data)) #list returns all of the true values in the data set
col10 <- names(data[indx]) #this gives me the names of the columns I want

这是我返回错误的 for 循环:

Here is the for loop I have which returns an error:

temp <- c()
for(i in 1:length(col10)){
   temp = col10[[i]]
  lm.test <- lm(Total_Transactions ~ temp[[i]], data = data)
  print(temp) #actually prints out the right column names
  i + 1
}

是否可以运行一个循环来将这些变量放入线性回归模型中?我得到的错误是:model.frame.default(formula = Total_Transactions ~ temp[[i]], : 变量长度不同(找到'temp[[i]]')".如果有人能指点我方向正确,我将不胜感激.谢谢.

Is it even possible to run a loop to place those variables in the linear regression model? The error I am getting is: "Error in model.frame.default(formula = Total_Transactions ~ temp[[i]], : variable lengths differ (found for 'temp[[i]]')". If anyone could point me in the right direction I would be very grateful. Thanks.

推荐答案

好的,我会发布一个答案.我将使用数据集 mtcars 作为示例.我相信它适用于您的数据集.
首先,我创建了一个存储,lm.test,一个list 类的对象.在您的代码中,您每次通过循环都会分配 lm(.) 的输出,最后您将只拥有最后一个,所有其他人都将被更新的人重写.
然后,在循环内部,我使用函数 reformulate 来组合回归公式.还有其他方法可以做到这一点,但这个方法很简单.

Ok, I'll post an answer. I will use the dataset mtcarsas an example. I believe it will work with your dataset.
First, I create a store, lm.test, an object of class list. In your code you are assigning the output of lm(.) every time through the loop and in the end you would only have the last one, all others would have been rewriten by the newer ones.
Then, inside the loop, I use function reformulate to put together the regression formula. There are other ways of doing this but this one is simple.

# Use just some columns
data <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt")]
col10 <- names(data)[-1]

lm.test <- vector("list", length(col10))

for(i in seq_along(col10)){
    lm.test[[i]] <- lm(reformulate(col10[i], "mpg"), data = data)
}

lm.test

现在您可以将结果列表用于各种事情.我建议你开始使用 lapply 和朋友们.
例如,要提取系数:

Now you can use the results list for all sorts of things. I suggest you start using lapply and friends for that.
For instance, to extract the coefficients:

cfs <- lapply(lm.test, coef)

为了得到摘要:

smry <- lapply(lm.test, summary)

一旦你熟悉了*apply 函数,它就会变得非常简单.

It becomes very simple once you're familiar with *apply functions.

这篇关于R Loop for Variable Names 运行线性回归模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆