用置信区间绘制回归系数 [英] Plot regression coefficient with confidence intervals

查看:73
本文介绍了用置信区间绘制回归系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有2个数据框,一个用于2015年,一个用于2016年.我想为每个数据框运行回归,并为每个回归绘制系数之一及其各自的置信区间.例如:

Suppose I have 2 data frames, one for 2015 and one for 2016. I want to run a regression for each data frame and plot one of the coefficient for each regression with their respective confidence interval. For example:

set.seed(1020022316)
library(dplyr)
library(stargazer)

df16 <- data.frame(
  x1 = rnorm(1000, 0, 2),
  t = sample(c(0, 1), 1000, T),
  e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.5 * x1 + 2 * t + e) %>%
  select(-e)

df15 <- data.frame(
  x1 = rnorm(1000, 0, 2),
  t = sample(c(0, 1), 1000, T),
  e = rnorm(1000, 0, 10)
) %>% mutate(y = 0.75 * x1 + 2.5 * t + e) %>%
  select(-e)

lm16 <- lm(y ~ x1 + t, data = df16)

lm15 <- lm(y ~ x1 + t, data = df15)

stargazer(lm15, lm16, type="text", style = "aer", ci = TRUE, ci.level = 0.95)

我想用各自的.95 CI绘制 t = 1.558,x = 2015 t = 2.797,x = 2016 .最好的方法是什么?

I want to plot t=1.558, x=2015, and t=2.797, x=2016 with their respective .95 CI. What is the best way of doing this?

我可以手工"完成,但我希望有更好的方法.

I could do it 'by hand', but I hope there is a better way.

library(ggplot2)
df.plot <-
  data.frame(
    y = c(lm15$coefficients[['t']], lm16$coefficients[['t']]),
    x = c(2015, 2016),
    lb = c(
      confint(lm15, 't', level = 0.95)[1],
      confint(lm16, 't', level = 0.95)[1]
    ),
    ub = c(
      confint(lm15, 't', level = 0.95)[2],
      confint(lm16, 't', level = 0.95)[2]
    )
  )
df.plot %>% ggplot(aes(x, y)) + geom_point() +
  geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1) + 
  geom_hline(aes(yintercept=0), linetype="dashed")


最佳:图形质量(看起来不错),代码优美,易于扩展(超过2个回归)

Best: The figure quality (looks nice), code elegance, easy to expand (more than 2 regressions)

推荐答案

这对于评论来说太长了,因此我将其发布为部分答案.

This is a bit too long for a comment, so I post it as a partial answer.

您的帖子尚不清楚,主要的问题是使数据变成正确的形状,还是绘图本身.但是,为了跟进其中一项评论,让我向您展示如何使用 dplyr broom 来运行多个模型,使绘制变得容易.考虑 mtcars -数据集:

It is unclear from your post if your main problem is to get the data into the right shape, or if it is the plotting itself. But just to follow up on one of the comments, let me show you how to do run several models using dplyr and broom that makes plotting easy. Consider the mtcars-dataset:

 library(dplyr)
 library(broom)
 models <- mtcars %>% group_by(cyl) %>% 
           do(data.frame(tidy(lm(mpg ~ disp, data = .),conf.int=T )))

 head(models) # I have abbreviated the following output a bit

    cyl        term estimate std.error statistic   p.value conf.low conf.high
  (dbl)       (chr)    (dbl)     (dbl)     (dbl)     (dbl)    (dbl)     (dbl)
     4 (Intercept)  40.8720    3.5896     11.39 0.0000012   32.752  48.99221
     4        disp  -0.1351    0.0332     -4.07 0.0027828   -0.210  -0.06010
     6 (Intercept)  19.0820    2.9140      6.55 0.0012440   11.591  26.57264
     6        disp   0.0036    0.0156      0.23 0.8259297   -0.036   0.04360

您会看到,这在一个不错的数据框中为您提供了所有系数和置信区间,这使得使用 ggplot 进行绘制变得更加容易.例如,如果您的数据集具有相同的内容,则可以向它们添加年份标识符(例如 df1 $ year<-2000; df2 $ year<-2001 等),然后将它们绑定在一起(例如,使用 bind_rows ,您可以使用 bind_rows .id 选项).然后,您可以在上面的示例中使用年份标识符而不是 cyl .

You see that this gives you all coefficients and confidence intervals in one nice dataframe, which makes plotting with ggplot easier. For instance, if your datasets have identical content, you could add a year identifier to them (e.g. df1$year <- 2000; df2$year <- 2001 etc), and bind them together afterwards (e.g. using bind_rows, of you can use bind_rows's .id option). Then you can use the year identifer instead of cyl in the above example.

然后绘图很简单.要再次使用 mtcars 数据,让我们仅绘制 disp 的系数(尽管您也可以使用 faceting 分组等):

The plotting then is simple. To use the mtcars data again, let's plot the coefficients for disp only (though you could also use faceting, grouping, etc):

 ggplot(filter(models, term=="disp"), aes(x=cyl, y=estimate)) + 
          geom_point() + geom_errorbar(aes(ymin=conf.low, ymax=conf.high))

要使用您的数据,请执行以下操作:

To use your data:

 df <- bind_rows(df16, df15, .id = "years")

 models <- df %>% group_by(years) %>% 
           do(data.frame(tidy(lm(y ~ x1+t, data = .),conf.int=T ))) %>%
           filter(term == "t") %>% 
           ggplot(aes(x=years, y=estimate)) + geom_point() + 
           geom_errorbar(aes(ymin=conf.low, ymax=conf.high)) 

请注意,只需将越来越多的数据绑定到主数据框,就可以轻松添加越来越多的模型.如果要绘制多个图形,还可以轻松使用 faceting grouping 或position- dodge ing来调整相应图形的外观系数.

Note that you can easily add more and more models just by binding more and more data to the main dataframe. You can also easily use faceting, grouping or position-dodgeing to adjust the look of the corresponding plot if you want to plot more than one coefficient.

这篇关于用置信区间绘制回归系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆