使用来自大型数据帧的单个预测器在单个响应的 R 中执行线性模型,并对每一列重复 [英] Performing a linear model in R of a single response with a single predictor from a large dataframe and repeat for each column

查看:24
本文介绍了使用来自大型数据帧的单个预测器在单个响应的 R 中执行线性模型,并对每一列重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

标题可能不是很清楚,但我想做的是:

It might not be very clear from the title but what I wish to do is:

  1. 我有一个包含 200 列的数据框 df,前 80 列是响应变量(y1、y2、y3、...),其余 120 个是预测变量(x1、x2、x3、...).

  1. I have a dataframe df with, say, 200 columns and the first 80 columns are response variables (y1, y2, y3, ...) and the rest of 120 are predictors (x1, x2, x3, ...).

我希望为每一对计算一个线性模型——lm(yi ~ xi, data = df).

I wish to compute a linear model for each pair – lm(yi ~ xi, data = df).

使用 lapply() 及其相关函数,我在网上查看的许多问题和解决方案要么是固定响应,要么是多个预测变量,或者相反.

Many problems and solutions I have looked through online have a either a fixed response vs many predictors or the other way around, using lapply() and its related functions.

熟悉它的人能指出我正确的步骤吗?

Could anyone who is familiar with it point me to the right step?

推荐答案

use tidyverse

library(tidyverse)
library(broom)
df <- mtcars
y <- names(df)[1:3]
x <- names(df)[4:7]

result <- expand_grid(x, y) %>% 
  rowwise() %>% 
  mutate(frm = list(reformulate(x, y)),
         model = list(lm(frm, data = df))) 

result$model <- purrr::set_names(result$model, nm = paste0(result$y, " ~ ", result$x))

result$model[1:2]
#> $`mpg ~ hp`
#> 
#> Call:
#> lm(formula = frm, data = df)
#> 
#> Coefficients:
#> (Intercept)           hp  
#>    30.09886     -0.06823  
#> 
#> 
#> $`cyl ~ hp`
#> 
#> Call:
#> lm(formula = frm, data = df)
#> 
#> Coefficients:
#> (Intercept)           hp  
#>     3.00680      0.02168

map_df(result$model, tidy)
#> # A tibble: 24 x 5
#>    term        estimate std.error statistic  p.value
#>    <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#>  1 (Intercept)  30.1      1.63       18.4   6.64e-18
#>  2 hp           -0.0682   0.0101     -6.74  1.79e- 7
#>  3 (Intercept)   3.01     0.425       7.07  7.41e- 8
#>  4 hp            0.0217   0.00264     8.23  3.48e- 9
#>  5 (Intercept)  21.0     32.6         0.644 5.25e- 1
#>  6 hp            1.43     0.202       7.08  7.14e- 8
#>  7 (Intercept)  -7.52     5.48       -1.37  1.80e- 1
#>  8 drat          7.68     1.51        5.10  1.78e- 5
#>  9 (Intercept)  14.6      1.58        9.22  2.93e-10
#> 10 drat         -2.34     0.436      -5.37  8.24e- 6
#> # ... with 14 more rows

map_df(result$model, glance)
#> # A tibble: 12 x 12
#>    r.squared adj.r.squared  sigma statistic  p.value    df logLik   AIC   BIC
#>        <dbl>         <dbl>  <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
#>  1     0.602         0.589   3.86     45.5  1.79e- 7     1  -87.6 181.  186. 
#>  2     0.693         0.683   1.01     67.7  3.48e- 9     1  -44.6  95.1  99.5
#>  3     0.626         0.613  77.1      50.1  7.14e- 8     1 -183.  373.  377. 
#>  4     0.464         0.446   4.49     26.0  1.78e- 5     1  -92.4 191.  195. 
#>  5     0.490         0.473   1.30     28.8  8.24e- 6     1  -52.7 111.  116. 
#>  6     0.504         0.488  88.7      30.5  5.28e- 6     1 -188.  382.  386. 
#>  7     0.753         0.745   3.05     91.4  1.29e-10     1  -80.0 166.  170. 
#>  8     0.612         0.599   1.13     47.4  1.22e- 7     1  -48.3 103.  107. 
#>  9     0.789         0.781  57.9     112.   1.22e-11     1 -174.  355.  359. 
#> 10     0.175         0.148   5.56      6.38 1.71e- 2     1  -99.3 205.  209. 
#> 11     0.350         0.328   1.46     16.1  3.66e- 4     1  -56.6 119.  124. 
#> 12     0.188         0.161 114.        6.95 1.31e- 2     1 -196.  398.  402. 
#> # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

reprex 包 (v0.3.0) 于 2020 年 12 月 11 日创建

Created on 2020-12-11 by the reprex package (v0.3.0)

这篇关于使用来自大型数据帧的单个预测器在单个响应的 R 中执行线性模型,并对每一列重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆