如何对线性回归进行Bootstrap和估计R中的置信区间？ [英] How to bootstrap a linear regression and estimate confidence intervals in R?

查看：13 发布时间：2022/3/24 21:40:43 r linear-regression confidence-interval statistics-bootstrap

本文介绍了如何对线性回归进行Bootstrap和估计R中的置信区间？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个执行引导的数据集，以便仅替换两个主要因素Replicate/Level内的值。

replicate level high.density low.density
    1     low    14          36
    1     low    54          31
    1     mid    82          10
    1     mid    24          NA
    2     low    12          28
    2     low    11          45
    2     mid    12          17
    2     mid    NA          24
    2      up    40          10
    2      up    NA           5
    2      up    20           2

##数据帧

 df <- structure(list(replicate = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2), level = c("low", "low", "mid", "mid", "low", "low", "mid", "mid", "up", "up", "up"), high.density = c(14, 54, 82, 24, 12, 11, 12, NA, 40, NA, 20), low.density = c(36, 31, 10, 
    NA, 28, 45, 17, 24, 10, 5, 2)), class = c("spec_tbl_df","tbl_df","tbl", "data.frame"), row.names = c(NA, -11L), spec = structure(list(cols = list(replicate = structure(list(), class = c("collector_double", "collector")), level = structure(list(), class = c("collector_character","collector")), high.density = structure(list(), class = c("collector_double","collector")), low.density = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", "collector")), skip = 1L), class = "col_spec"))
    
    df$replicate <- as.factor(as.numeric(df$replicate))
    df$level <- as.factor(as.character(df$level))

##创建引导所需的数据集。仅允许对唯一复制/级别中带有-的值进行置乱(Credits: Dion)

df_shuffle <- function(DF) {
  my_split <- split(DF, f = ~ DF$replicate + DF$level)
  shuffle <- lapply(my_split, function(x) {
    nrX <- nrow(x)
    cbind(x[, c('replicate', 'level')],
          high.density = x[sample(seq_len(nrX), replace = TRUE), 'high.density'],
          low.density = x[sample(seq_len(nrX), replace = TRUE), 'low.density'])
  })
  DF_new <- do.call(rbind, shuffle)
  rownames(DF_new) <- NULL
  return(DF_new)
}

B <- 1000
df_list <- replicate(B, df_shuffle(df), simplify = FALSE)
df_list <- lapply(df_list, function(x) x[complete.cases(x), ]) #choose complete cases

现在我想引导这些观察，以估计系数、p值和置信区间。我正在尝试复制引导函数like in this example并绘制正确的置信区间like in this example(我只需要整个引导线和置信区间)

#绘图示例代码

df_boot <-  rbindlist(df_list, idcol = 'count')

ggplot(aes(x = low.density, y = high.density), data = df_boot) +  
  stat_smooth(aes(group = factor(count)), geom = "line", method = "lm", se = FALSE, color = "red", alpha=0.02) +
  stat_smooth(geom = "line", method = "lm", se = FALSE, color = "black", linetype = "longdash") +
  theme(panel.background = element_blank()) + theme(legend.position="none")

推荐答案

我们可以对假设的线性模型进行非参数或参数引导。非参数引导和参数引导之间的重要区别在于，在非参数情况下，我们将重复从原始数据帧df中采样，而在参数情况下，我们将从原始模型中模拟新数据。

我们假设以下线性模型：

high.density_i=b₀+_{b<₁low.density+_{_{i，_{i_{i，_{i_{_{e_i}}}}}}}}

重要的是，首先逐个列表删除包含丢失观测的行，然后运行引导过程(即使您最好在独立变量中乘以丢失观测，因为如果丢失数据机制不是所谓的MCAR类型的，那么您将获得有偏见的推断)。在您的问题中提到的post with the functiondf_shuffle()中，是在重新采样数据之后执行的列表删除。在重采样之前执行列表删除可确保每个引导示例的行数与df相同。这是引导运行的前提条件，因此能够基于引导过程进行有效推断。

函数boot_lm()允许用户执行非参数或参数引导。它由以下参数组成：

original_model：指定假设线性模型的字符串。

original_data：指定用于拟合假设线性模型的数据帧的字符串。

type：指定参数(param)还是非参数(ordinary)引导的字符串。

B：要采集的引导样本数。

seed：修复随机数生成器的整数。

# listwise deletion df <- df[complete.cases(df), ] # linear model to be bootstrapped fm0 <- lm(high.density ~ low.density, data = df) boot_lm <- function(original_data, original_model, type = c('ordinary', 'param'), B = 1000L, seed = 1) { set.seed(seed) betas_original_model <- coef(original_model) len_coef <- length(betas_original_model) mat <- matrix(rep(0L, B * len_coef), ncol = len_coef) if (type %in% 'ordinary') { n_rows <- length(residuals(original_model)) for (i in seq_len(B)) { boot_dat <- original_data[sample(seq_len(n_rows), replace = TRUE), ] mat[i, ] <- coef(lm(terms(original_model), data = boot_dat)) } } if (type %in% 'param') { X <- model.matrix(delete.response(terms(original_model)), data = original_data)[, -1L] for (i in seq_len(B)) { mat[i, ] <- coef(lm(unlist(simulate(original_model)) ~ X, data = original_data)) } } confints <- matrix(rep(0L, 2L * len_coef), ncol = 2L) pvals <- numeric(len_coef) for (i in seq_len(len_coef)) { pvals[i] <- mean(abs(mat[, i] - mean(mat[, i])) > abs(betas_original_model[i])) confints[i, ] <- quantile(mat[, i], c(.025, 0.975)) } names(pvals) <- names(betas_original_model) out <- data.frame(estimate = betas_original_model, 'lwr' = confints[, 1], 'upr' = confints[, 2], p_value = pvals) return(out) }

输出：您的数据

# non-parametric bootstrap ordinary <- boot_lm(original_data = df, original_model = fm0, type = 'ordinary', B = 1e4) > ordinary estimate lwr upr p_value (Intercept) 45.1522806 16.290080 88.6969733 0.0220 low.density -0.6492639 -2.055204 0.5368766 0.2792 # -------------------------------------------------------- # parametric bootstrap param <- boot_lm(original_data = df, original_model = fm0, type = 'param', B = 1e4) > param estimate lwr upr p_value (Intercept) 45.1522806 10.472075 79.1197394 0.0103 low.density -0.6492639 -1.971258 0.6381189 0.3245

输出：mtcar

# linear model to be bootstrapped fm1 <- lm(mpg ~ wt + cyl + qsec, data = mtcars) ordinary <- boot_lm(original_data = mtcars, original_model = fm1, type = 'ordinary', B = 1e4) > ordinary estimate lwr upr p_value (Intercept) 29.4290521 13.8283579 41.2637258 0.0009 wt -3.8616401 -6.6867159 -2.0884969 0.0084 cyl -0.9277487 -1.9447741 0.4831599 0.1174 qsec 0.4944817 -0.1141793 1.3369213 0.1825

这篇关于如何对线性回归进行Bootstrap和估计R中的置信区间？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何对线性回归进行Bootstrap和估计R中的置信区间？ [英] How to bootstrap a linear regression and estimate confidence intervals in R?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何对线性回归进行Bootstrap和估计R中的置信区间？ [英] How to bootstrap a linear regression and estimate confidence intervals in R?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭