ggplot2:如何为geom_smooth中的预测获取鲁棒的置信区间? [英] ggplot2: how to get robust confidence interval for predictions in geom_smooth?

查看:302
本文介绍了ggplot2:如何为geom_smooth中的预测获取鲁棒的置信区间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑这个简单的例子

dataframe <- data_frame(x = c(1,2,3,4,5,6),
                        y = c(12,24,24,34,12,15))
> dataframe
# A tibble: 6 x 2
      x     y
  <dbl> <dbl>
1     1    12
2     2    24
3     3    24
4     4    34
5     5    12
6     6    15    

dataframe %>% ggplot(., aes(x = x, y = y)) + 
geom_point() + 
geom_smooth(method = 'lm', formula = y~x)

此处,标准误差是使用默认选项计算的.但是,我想使用软件包sandwichlmtest

Here the standard errors are computed with the default option. However, I would like to use the robust variance-covariance matrix available in the package sandwich and lmtest

也就是说,使用vcovHC(mymodel, "HC3")

是否可以使用geom_smooth()函数以一种简单的方式获得该信息?

Is there a way to get that in a simple way using the geom_smooth() function?

推荐答案

HC健壮的SE(简单)

est > 包及其lm_robust函数系列.例如

This is easily done now thanks to the estimatr package and its family of lm_robust functions. E.g.

library(tidyverse)
library(estimatr)

dataframe <- data.frame(x = c(1,2,3,4,5,6),
                        y = c(12,24,24,34,12,15))

dataframe %>% 
  ggplot(aes(x = x, y = y)) + 
  geom_point() + 
  geom_smooth(method = 'lm_robust', formula = y~x, fill="#E41A1C") + ## Robust (HC) SEs
  geom_smooth(method = 'lm', formula = y~x, col = "grey50") + ## Just for comparison
  labs(
    title = "Plotting HC robust SEs in ggplot2",
    subtitle = "Regular SEs in grey for comparison"
    ) +
  theme_minimal()

reprex软件包(v0.3.0)创建于2020-03-08 sup>

Created on 2020-03-08 by the reprex package (v0.3.0)

HAC强大的SE(更多的日常工作)

一个警告是 estimatr 不会还提供了对HAC(即异方差自相关一致)SEs a la Newey-West的支持.但是,可以使用 sandwich 包手动获取这些信息……这还是原始问题一直在问的一种.然后可以使用geom_ribbon()绘制它们.

The one caveat is that estimatr does not yet offer support for HAC (i.e. heteroscedasticity and autocorrelation consistent) SEs a la Newey-West. However, it is possible to obtain these manually with the sandwich package... which is kind of what the original question was asking anyway. You can then plot them using geom_ribbon().

我要记录一下,对于这个特定的数据集,HAC SE并没有多大意义.但是,这里有一个示例,显示了如何做到这一点,以关于相关主题的出色答案很好.

I'll say for the record that HAC SEs don't make much sense for this particular data set. But here's an example of how you could do it, riffing off this excellent SO answer on a related topic.

library(tidyverse)
library(sandwich)

dataframe <- data.frame(x = c(1,2,3,4,5,6),
                        y = c(12,24,24,34,12,15))

reg1 <- lm(y~x, data = dataframe)

## Generate a prediction DF
pred_df <- data.frame(fit = predict(reg1))

## Get the design matrix
X_mat <- model.matrix(reg1)

## Get HAC VCOV matrix and calculate SEs
v_hac <- NeweyWest(reg1, prewhite = FALSE, adjust = TRUE) ## HAC VCOV (adjusted for small data sample)
#> Warning in meatHAC(x, order.by = order.by, prewhite = prewhite, weights =
#> weights, : more weights than observations, only first n used
var_fit_hac <- rowSums((X_mat %*% v_hac) * X_mat)  ## Point-wise variance for predicted mean
se_fit_hac <- sqrt(var_fit_hac) ## SEs

## Add these to pred_df and calculate the 95% CI
pred_df <-
  pred_df %>%
  mutate(se_fit_hac = se_fit_hac) %>%
  mutate(
    lwr_hac = fit - qt(0.975, df=reg1$df.residual)*se_fit_hac,
    upr_hac = fit + qt(0.975, df=reg1$df.residual)*se_fit_hac
    )

pred_df
#>        fit se_fit_hac   lwr_hac  upr_hac
#> 1 20.95238   4.250961  9.149822 32.75494
#> 2 20.63810   2.945392 12.460377 28.81581
#> 3 20.32381   1.986900 14.807291 25.84033
#> 4 20.00952   1.971797 14.534936 25.48411
#> 5 19.69524   2.914785 11.602497 27.78798
#> 6 19.38095   4.215654  7.676421 31.08548

## Plot it
bind_cols(
  dataframe,
  pred_df
  ) %>%
  ggplot(aes(x = x, y = y, ymin=lwr_hac, ymax=upr_hac)) + 
  geom_point() + 
  geom_ribbon(fill="#E41A1C", alpha=0.3, col=NA) + ## Robust (HAC) SEs
  geom_smooth(method = 'lm', formula = y~x, col = "grey50") + ## Just for comparison
  labs(
    title = "Plotting HAC SEs in ggplot2",
    subtitle = "Regular SEs in grey for comparison",
    caption = "Note: Do HAC SEs make sense for this dataset? Definitely not!"
    ) +
  theme_minimal()

reprex软件包(v0.3.0)创建于2020-03-08 sup>

Created on 2020-03-08 by the reprex package (v0.3.0)

请注意,如果您愿意,也可以使用此方法手动计算和绘制其他鲁棒的SE预测(例如HC1,HC2等).您所需要做的就是使用相关的三明治估算器.例如,使用vcovHC(reg1, type = "HC2")代替NeweyWest(reg1, prewhite = FALSE, adjust = TRUE)将为您提供与第一个使用 estimatr 软件包的示例相同的HC鲁棒CI.

Note that you could also use this approach to manually calculate and plot other robust SE predictions (e.g. HC1, HC2,etc.) if you so wished. All you would need to do is use the relevant sandwich estimator. For instance, using vcovHC(reg1, type = "HC2") instead of NeweyWest(reg1, prewhite = FALSE, adjust = TRUE) will give you an identical HC-robust CI to the first example that uses the estimatr package.

这篇关于ggplot2:如何为geom_smooth中的预测获取鲁棒的置信区间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆