跨多个变量运行单一线性回归,分组 [英] Running single linear regressions across multiple variables, in groups

查看:16
本文介绍了跨多个变量运行单一线性回归,分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对根据另一个变量分组的大量变量运行简单的单一线性回归.以 mtcars 数据集为例,我想在 mpg 和其他变量(mpg ~ disp、mpg ~ hp 等)之间运行单独的线性回归,并按另一个变量(例如 cyl)分组.

I'm trying to run a simple single linear regression over a large number of variables, grouped according to another variable. Using the mtcars dataset as an example, I'd like to run a separate linear regression between mpg and each other variable (mpg ~ disp, mpg ~ hp, etc.), grouped by another variable (for example, cyl).

在每个变量上独立运行 lm 可以很容易地使用 purrr::map (从这个伟大的教程修改 - https://sebastiansauer.github.io/EDIT-multiple_lm_purrr_EDIT/):

Running lm over each variable independently can easily be done using purrr::map (modified from this great tutorial - https://sebastiansauer.github.io/EDIT-multiple_lm_purrr_EDIT/):

library(dplyr)
library(tidyr)
library(purrr)

mtcars %>%
  select(-mpg) %>% #exclude outcome, leave predictors
  map(~ lm(mtcars$mpg ~ .x, data = mtcars)) %>%
  map_df(glance, .id='variable') %>%
  select(variable, r.squared, p.value)

# A tibble: 10 x 3
   variable r.squared  p.value
   <chr>        <dbl>    <dbl>
 1 cyl          0.726 6.11e-10
 2 disp         0.718 9.38e-10
 3 hp           0.602 1.79e- 7
 4 drat         0.464 1.78e- 5
 5 wt           0.753 1.29e-10
 6 qsec         0.175 1.71e- 2
 7 vs           0.441 3.42e- 5
 8 am           0.360 2.85e- 4
 9 gear         0.231 5.40e- 3
10 carb         0.304 1.08e- 3

使用 map 对分组变量运行线性模型也很容易:

And running a linear model over grouped variables is also easy using map:

mtcars %>%
  split(.$cyl) %>% #split by grouping variable
  map(~ lm(mpg ~ wt, data = .)) %>%
  map_df(broom::glance, .id='cyl') %>%
  select(cyl, variable, r.squared, p.value)

# A tibble: 3 x 3
  cyl   r.squared p.value
  <chr>     <dbl>   <dbl>
1 4         0.509  0.0137
2 6         0.465  0.0918
3 8         0.423  0.0118

所以我可以按变量或按组运行.但是,我不知道如何将这两者结合起来(按 cyl 对所有内容进行分组,然后分别运行 lm(mpg ~ each other variable).我希望做这样的事情:

So I can run by variable, or by group. However, I can't figure out how to combine these two (grouping everything by cyl, then running lm(mpg ~ each other variable, separately). I'd hoped to do something like this:

mtcars %>%
  select(-mpg) %>% #exclude outcome, leave predictors
  split(.$cyl) %>% # group by grouping variable
  map(~ lm(mtcars$mpg ~ .x, data = mtcars)) %>% #run lm across all variables
  map_df(glance, .id='cyl') %>%
  select(cyl, variable, r.squared, p.value)

并得到一个结果,它给出了 cyl(group)、variable、r.squared 和 p.value(3 个组 * 10 个变量 = 30 个模型输出的组合).

and get a result that gives me cyl(group), variable, r.squared, and p.value (a combination of 3 groups * 10 variables = 30 model outputs).

但是 split() 将数据框变成一个列表,从第 1 部分 [map(~ lm(mtcars$mpg ~ .x, data = mtcars)) ] 无法处理.我试图对其进行修改,使其不明确引用原始数据结构,但无法找出可行的解决方案.非常感谢任何帮助!

But split() turns the dataframe into a list, which the construction from part 1 [ map(~ lm(mtcars$mpg ~ .x, data = mtcars)) ] can't handle. I have tried to modify it so that it doesn't explicitly refer to the original data structure, but can't figure out a working solution. Any help is greatly appreciated!

推荐答案

IIUC,你可以使用 group_bygroup_modify,带有 map在里面迭代预测器.

IIUC, you can use group_by and group_modify, with a map inside that iterates over predictors.

如果你可以提前隔离你的预测变量,它会变得更容易,就像这个解决方案中的 ivs 一样.

If you can isolate your predictor variables in advance, it'll make it easier, as with ivs in this solution.

library(tidyverse)

ivs <- colnames(mtcars)[3:ncol(mtcars)]
names(ivs) <- ivs

mtcars %>% 
  group_by(cyl) %>% 
  group_modify(function(data, key) {
    map_df(ivs, function(iv) {
      frml <- as.formula(paste("mpg", "~", iv))
      lm(frml, data = data) %>% broom::glance()
      }, .id = "iv") 
  }) %>% 
  select(cyl, iv, r.squared, p.value)

# A tibble: 27 × 4
# Groups:   cyl [3]
     cyl iv    r.squared  p.value
   <dbl> <chr>     <dbl>    <dbl>
 1     4 disp  0.648      0.00278
 2     4 hp    0.274      0.0984 
 3     4 drat  0.180      0.193  
 4     4 wt    0.509      0.0137 
 5     4 qsec  0.0557     0.485  
 6     4 vs    0.00238    0.887  
 7     4 am    0.287      0.0892 
 8     4 gear  0.115      0.308  
 9     4 carb  0.0378     0.567  
10     6 disp  0.0106     0.826  
11     6 hp    0.0161     0.786  
# ...

这篇关于跨多个变量运行单一线性回归,分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆