R循环的变量名运行线性回归模型 [英] R Loop for Variable Names to run linear regression model

查看:144
本文介绍了R循环的变量名运行线性回归模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我对此很陌生,因此我的方法/想法可能是错误的,我已经使用R和R studio将xlsx数据集导入到数据框中.我希望能够遍历列名,以获取所有变量中带有正好为" 10 "的变量,以便运行简单的线性回归.所以这是我的代码:

First off, I am pretty new to this so my method/thinking may be wrong, I have imported a xlsx data set into a data frame using R and R studio. I want to be able to loop through the column names to get all of the variables with exactly "10" in them in order to run a simple linear regression. So here's my code:

indx <- grepl('_10_', colnames(data)) #list returns all of the true values in the data set
col10 <- names(data[indx]) #this gives me the names of the columns I want

这是我遇到的for循环,返回错误:

Here is the for loop I have which returns an error:

temp <- c()
for(i in 1:length(col10)){
   temp = col10[[i]]
  lm.test <- lm(Total_Transactions ~ temp[[i]], data = data)
  print(temp) #actually prints out the right column names
  i + 1
}

是否甚至可以运行循环以将这些变量放入线性回归模型中?我得到的错误是:"model.frame.default中的错误(公式= Total_Transactions〜temp [[i]] ,:可变长度不同(为'temp [[i]]'找到")).在正确的方向上,我将不胜感激.谢谢.

Is it even possible to run a loop to place those variables in the linear regression model? The error I am getting is: "Error in model.frame.default(formula = Total_Transactions ~ temp[[i]], : variable lengths differ (found for 'temp[[i]]')". If anyone could point me in the right direction I would be very grateful. Thanks.

推荐答案

好的,我将发布答案.我将使用数据集mtcars作为示例.我相信它将与您的数据集一起使用.
首先,我创建一个商店lm.test,它是类list的对象.在您的代码中,每次循环时都要分配lm(.)的输出,最后只剩下最后一个,所有其他的都将被较新的重写.
然后,在循环内部,使用函数reformulate组合回归公式.还有其他方法可以做到这一点,但这很简单.

Ok, I'll post an answer. I will use the dataset mtcarsas an example. I believe it will work with your dataset.
First, I create a store, lm.test, an object of class list. In your code you are assigning the output of lm(.) every time through the loop and in the end you would only have the last one, all others would have been rewriten by the newer ones.
Then, inside the loop, I use function reformulate to put together the regression formula. There are other ways of doing this but this one is simple.

# Use just some columns
data <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt")]
col10 <- names(data)[-1]

lm.test <- vector("list", length(col10))

for(i in seq_along(col10)){
    lm.test[[i]] <- lm(reformulate(col10[i], "mpg"), data = data)
}

lm.test

现在,您可以将结果列表用于各种事情.我建议您开始使用lapply和它的朋友.
例如,要提取系数:

Now you can use the results list for all sorts of things. I suggest you start using lapply and friends for that.
For instance, to extract the coefficients:

cfs <- lapply(lm.test, coef)

为了获得摘要:

smry <- lapply(lm.test, summary)

一旦您熟悉*apply函数,它将变得非常简单.

It becomes very simple once you're familiar with *apply functions.

这篇关于R循环的变量名运行线性回归模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆