用dplyr计算95%-CI的长度 [英] Calculating length of 95%-CI using dplyr

查看:218
本文介绍了用dplyr计算95%-CI的长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最后一次询问如何计算每个测量场合(一周)的变量(procras)的平均得分,这个变量对于多个受访者反复测量。所以我的(简化的)长格式数据集看起来像下面这样(这里有两个学生,5个时间点,没有分组变量):

  studentID week procras 
1 0 1.4
1 6 1.2
1 16 1.6
1 28 NA
1 40 3.8
2 0 1.4
2 6 1.8
2 16 2.0
2 28 2.5
2 40 2.8

使用dplyr我会得到每个度量场合的平均分数

  mean_data < -  group_by(DataRlong,week )%>%汇总(procras = mean(procras,na.rm = TRUE))

例如:

 来源:local data frame [5 x 2] 
occ procras
(dbl )(dbl)
1 0 1.993141
2 6 2.124020
3 16 2.251548
4 28 2.469658
5 40 2.617903

使用ggplot2我现在可以绘制随时间的平均变化,并且通过轻松调整dplyr的group_data(),我也可以获得每个子组的意味着例如,男性和女性每次平均得分)。
现在我想在mean_data表中添加一列,其中包括95%-CIs每个场合平均得分的长度。

http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/ < a>解释了如何获取和绘制配置项,但是,只要我想为任何子群执行此操作,这种方法似乎就会出现问题,对吗?那么有没有办法让dplyr自动在mean_data中包含CI(基于组的大小等)?
之后,应该相当容易地将新值作为CI映射到我希望的图中。
谢谢。

您可以使用 mutate 中总结一些额外的函数

  library(dplyr)
mtcars%>%
group_by(vs)%>%
汇总(mean.mpg =平均值(mpg,na.rm = TRUE),
sd.mpg = sd (mpg,na.rm = TRUE),
n.mpg = n())%>%
mutate(se.mpg = sd.mpg / sqrt(n.mpg),
lower.ci.mpg = mean.mpg-qt(1-(0.05 / 2),n.mpg-1)* se.mpg,
upper.ci.mpg = mean.mpg + qt(1 - (0.05 / 2),n.mpg - 1)* se.mpg)

#>来源:本地数据框[2 x 7]
#>
#> vs mean.mpg sd.mpg n.mpg se.mpg lower.ci.mpg upper.ci.mpg
#> (dbl)(dbl)(dbl)(int)(dbl)(dbl)(dbl)
#> 1 0 16.61667 3.860699 18 0.9099756 14.69679 18.53655
#> 2 1 24.55714 5.378978 14 1.4375924 21.45141 27.66287


Last time I asked how it was possible to calculate the average score per measurement occasion (week) for a variable (procras) that has been measured repeatedly for multiple respondents. So my (simplified) dataset in long format looks for example like the following (here two students, and 5 time points, no grouping variable):

studentID  week   procras
   1        0     1.4
   1        6     1.2
   1        16    1.6
   1        28    NA
   1        40    3.8
   2        0     1.4
   2        6     1.8
   2        16    2.0
   2        28    2.5
   2        40    2.8

Using dplyr I would get the average score per measurement occasion

mean_data <- group_by(DataRlong, week)%>% summarise(procras = mean(procras, na.rm = TRUE))

Looking like this e.g.:

Source: local data frame [5 x 2]
        occ  procras
      (dbl)    (dbl)
    1     0 1.993141
    2     6 2.124020
    3    16 2.251548
    4    28 2.469658
    5    40 2.617903

With ggplot2 I could now plot the average change over time, and by easily adjusting the group_data() of dplyr I could also get means per sub groups (for instance, the average score per occasion for men and women). Now I would like to add a column to the mean_data table which includes the length for the 95%-CIs around the average score per occasion.

http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/ explains how to get and plot CIs, but this approach seems to become problematic as soon as I wanted to do this for any subgroup, right? So is there a way to let dplyr also include the CI (based on group size, ect.) automatically in the mean_data? After that it should be fairly easy to plot the new values as CIs into the graphs I hope. Thank you.

解决方案

You could do it manually using mutate a few extra functions in summarise

library(dplyr)
mtcars %>%
  group_by(vs) %>%
  summarise(mean.mpg = mean(mpg, na.rm = TRUE),
            sd.mpg = sd(mpg, na.rm = TRUE),
            n.mpg = n()) %>%
  mutate(se.mpg = sd.mpg / sqrt(n.mpg),
         lower.ci.mpg = mean.mpg - qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg,
         upper.ci.mpg = mean.mpg + qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg)

#> Source: local data frame [2 x 7]
#> 
#>      vs mean.mpg   sd.mpg n.mpg    se.mpg lower.ci.mpg upper.ci.mpg
#>   (dbl)    (dbl)    (dbl) (int)     (dbl)        (dbl)        (dbl)
#> 1     0 16.61667 3.860699    18 0.9099756     14.69679     18.53655
#> 2     1 24.55714 5.378978    14 1.4375924     21.45141     27.66287

这篇关于用dplyr计算95%-CI的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆