分割数据并运行线性回归循环 [英] Splitting data and running linear regression loop

查看:119
本文介绍了分割数据并运行线性回归循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看到很多类似的问题,但是我试图写的循环的一个关键就是我所缺少的.我有一组具有约4,000个不同键的数据集,每个键有约1,000个观察值.我已经筛选出一个关键点以隔离该关键点的观察值,运行线性回归,检查了模型假设,并且一切看起来都不错.但是,我想遍历此数据集并为每个键运行线性回归.然后,我将要存储系数,pvalue,R ^ 2等,并一起进行审查.

I have seen a lot of similar questions, but there is one key to the loop that I am trying to write that I am missing. I have a a set of dataset with ~4,000 different keys, and for each key, there are ~1,000 observations. I have filtered out a key to isolate the observations for that key, run linear regression, checked model assumptions and all looks good. However, I want to loop over this dataset and run that linear regression for each of the keys. Then I will want to store the coefficients, pvalues, R^2, etc and review them together.

以下是我的数据示例:

Key y1 x1 x2
A   10 1  3
A   11 2  4 
A   12 3  5
B   13 4  6 
B   14 5  7
B   15 6  8
C   16 7  9 
C   17 8  1
C   18 9  2

我已经跑步:

datA <- data %>% filter(key=='A')
lm(y1 ~ x1 + x2, data = datA)

,然后对键B和C重复该操作.我在这里看到的每个问题都是针对整个集合的不同变量进行循环,而不是对行上的数据进行拆分.

And then repeated that for keys B and C. Each question that I have seen on here is looking at the looping over the different variables for the entire set, but not splitting the data on the rows.

但是我需要再做4,000次.编写此循环的任何帮助将不胜感激(在编写循环时我很糟糕).

But I need to do this 4,000 more times. Any assistance to write this loop would be greatly appreciated (I am terrible at writing loops).

推荐答案

还可以使用 broom 包将输出整理成更易读的形式.

Can also use the broom package to tidy the output into a more readable form.

list_models <- lapply(split(data, data$Key), function(x) lm(y1 ~ x1 + x2, data = x))

library(broom)

as_tibble(do.call(rbind, lapply(list_models, broom::tidy)))

# A tibble: 7 x 5
  term        estimate  std.error statistic    p.value
  <chr>          <dbl>      <dbl>     <dbl>      <dbl>
1 (Intercept) 9.00e+ 0   2.22e-15   4.05e15   1.57e-16
2 x1          1.00e+ 0   1.03e-15   9.73e14   6.54e-16
3 (Intercept) 9.00e+ 0   4.59e-15   1.96e15   3.25e-16
4 x1          1.00e+ 0   9.06e-16   1.10e15   5.77e-16
5 (Intercept) 9.00e+ 0 NaN        NaN       NaN       
6 x1          1.00e+ 0 NaN        NaN       NaN       
7 x2          3.02e-16 NaN        NaN       NaN  

这篇关于分割数据并运行线性回归循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆