使用大型数据框中的单个预测变量在R中对单个响应执行线性模型,并为每一列重复 [英] Performing a linear model in R of a single response with a single predictor from a large dataframe and repeat for each column

查看:54
本文介绍了使用大型数据框中的单个预测变量在R中对单个响应执行线性模型,并为每一列重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

标题可能不太清楚,但我想做的是:

It might not be very clear from the title but what I wish to do is:

  1. 我有一个数据帧df,例如200列,前80列是响应变量(y1,y2,y3等),其余120个是预测变量(x1,x2,x3,...).

  1. I have a dataframe df with, say, 200 columns and the first 80 columns are response variables (y1, y2, y3, ...) and the rest of 120 are predictors (x1, x2, x3, ...).

我希望为每对计算一个线性模型– lm(yi〜xi,data = df).

I wish to compute a linear model for each pair – lm(yi ~ xi, data = df).

我在网上浏览过的许多问题和解决方案使用 lapply()及其相关功能,对许多预测变量的响应都是固定的或相反的.

Many problems and solutions I have looked through online have a either a fixed response vs many predictors or the other way around, using lapply() and its related functions.

熟悉它的人能指出我正确的步骤吗?

Could anyone who is familiar with it point me to the right step?

推荐答案

使用 tidyverse

library(tidyverse)
library(broom)
df <- mtcars
y <- names(df)[1:3]
x <- names(df)[4:7]

result <- expand_grid(x, y) %>% 
  rowwise() %>% 
  mutate(frm = list(reformulate(x, y)),
         model = list(lm(frm, data = df))) 

result$model <- purrr::set_names(result$model, nm = paste0(result$y, " ~ ", result$x))

result$model[1:2]
#> $`mpg ~ hp`
#> 
#> Call:
#> lm(formula = frm, data = df)
#> 
#> Coefficients:
#> (Intercept)           hp  
#>    30.09886     -0.06823  
#> 
#> 
#> $`cyl ~ hp`
#> 
#> Call:
#> lm(formula = frm, data = df)
#> 
#> Coefficients:
#> (Intercept)           hp  
#>     3.00680      0.02168

map_df(result$model, tidy)
#> # A tibble: 24 x 5
#>    term        estimate std.error statistic  p.value
#>    <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#>  1 (Intercept)  30.1      1.63       18.4   6.64e-18
#>  2 hp           -0.0682   0.0101     -6.74  1.79e- 7
#>  3 (Intercept)   3.01     0.425       7.07  7.41e- 8
#>  4 hp            0.0217   0.00264     8.23  3.48e- 9
#>  5 (Intercept)  21.0     32.6         0.644 5.25e- 1
#>  6 hp            1.43     0.202       7.08  7.14e- 8
#>  7 (Intercept)  -7.52     5.48       -1.37  1.80e- 1
#>  8 drat          7.68     1.51        5.10  1.78e- 5
#>  9 (Intercept)  14.6      1.58        9.22  2.93e-10
#> 10 drat         -2.34     0.436      -5.37  8.24e- 6
#> # ... with 14 more rows

map_df(result$model, glance)
#> # A tibble: 12 x 12
#>    r.squared adj.r.squared  sigma statistic  p.value    df logLik   AIC   BIC
#>        <dbl>         <dbl>  <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
#>  1     0.602         0.589   3.86     45.5  1.79e- 7     1  -87.6 181.  186. 
#>  2     0.693         0.683   1.01     67.7  3.48e- 9     1  -44.6  95.1  99.5
#>  3     0.626         0.613  77.1      50.1  7.14e- 8     1 -183.  373.  377. 
#>  4     0.464         0.446   4.49     26.0  1.78e- 5     1  -92.4 191.  195. 
#>  5     0.490         0.473   1.30     28.8  8.24e- 6     1  -52.7 111.  116. 
#>  6     0.504         0.488  88.7      30.5  5.28e- 6     1 -188.  382.  386. 
#>  7     0.753         0.745   3.05     91.4  1.29e-10     1  -80.0 166.  170. 
#>  8     0.612         0.599   1.13     47.4  1.22e- 7     1  -48.3 103.  107. 
#>  9     0.789         0.781  57.9     112.   1.22e-11     1 -174.  355.  359. 
#> 10     0.175         0.148   5.56      6.38 1.71e- 2     1  -99.3 205.  209. 
#> 11     0.350         0.328   1.46     16.1  3.66e- 4     1  -56.6 119.  124. 
#> 12     0.188         0.161 114.        6.95 1.31e- 2     1 -196.  398.  402. 
#> # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

reprex软件包(v0.3.0)创建于2020-12-11 sup>

Created on 2020-12-11 by the reprex package (v0.3.0)

这篇关于使用大型数据框中的单个预测变量在R中对单个响应执行线性模型,并为每一列重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆