如何对总体中的子组应用回归? [英] How do I apply the regression for subgroups within the population?

查看:41
本文介绍了如何对总体中的子组应用回归?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下数据框

weight <- c(100, 137, 158, 225, 149)
age <- c(15, 18, 21, 31, 65)
gender <- c("Female, "Male, "Male", "Male", "Female")
table <- data.frame(weight, age, gender)

如果我想进行线性回归以了解体重如何预测年龄并进行检查,我会这样做:

If I wanted to do a linear regression to see how weight predicts age, as well as examine it, I'd do:

allData <- lm(age ~ weight, data = table)
summary(allData)

如果我只想检查体重如何预测年龄,该怎么办?就像在其中一样,仅使用女性数据人群来查看体重如何预测年龄?我在想类似的东西:

What do I do if I wanted to examine how weight predicts age for females only? As in, use only the female data population to see how weight predicts age? I'm thinking something like:

FemaleData <- lm(age ~ weight, data=table (gender="Female"))

推荐答案

library(dplyr)
library(broom)

# example dataset
weight <- c(100, 137, 158, 225, 149, 148)
age <- c(15, 18, 21, 31, 65, 64)
gender <- c("Female", "Male", "Male", "Male", "Female", "Female")
table <- data.frame(weight, age, gender)

# build model for each gender value and store it in a column
table %>%
  group_by(gender) %>%                                  # for each gender value
  do(model = summary(lm(age ~ weight, data = .))) %>%   # build a model
  ungroup() -> tbl_models

# check how your new dataset looks like
tbl_models

# # A tibble: 2 x 2
#     gender            model
#   * <fctr>           <list>
#   1 Female <S3: summary.lm>
#   2   Male <S3: summary.lm>

# access / view model for Females
tbl_models %>% filter(gender == "Female") %>% pull(model)

# [[1]]
# 
# Call:
#   lm(formula = age ~ weight, data = .)
# 
# Residuals:
#   1          2          3 
# -0.0002125 -0.0101997  0.0104122 
# 
# Coefficients:
#                 Estimate Std. Error t value Pr(>|t|)    
#   (Intercept) -8.706e+01  4.943e-02   -1761 0.000361 ***
#   weight       1.021e+00  3.681e-04    2773 0.000230 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.01458 on 1 degrees of freedom
# Multiple R-squared:      1,   Adjusted R-squared:      1 
# F-statistic: 7.69e+06 on 1 and 1 DF,  p-value: 0.0002296

# build model for each gender value and store it as a tidy dataset
table %>%
  group_by(gender) %>%
  do(tidy(lm(age ~ weight, data = .))) %>%
  ungroup()

# # A tibble: 4 x 6
#   gender        term    estimate    std.error   statistic      p.value
#   <fctr>       <chr>       <dbl>        <dbl>       <dbl>        <dbl>
# 1 Female (Intercept) -87.0609860 0.0494272875 -1761.39518 0.0003614292
# 2 Female      weight   1.0206120 0.0003680516  2773.01334 0.0002295769
# 3   Male (Intercept)  -2.3370680 0.2181313917   -10.71404 0.0592475719
# 4   Male      weight   0.1480985 0.0012299556   120.40961 0.0052869963

这篇关于如何对总体中的子组应用回归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆