用dplyr计算95%-CI的长度 [英] Calculating length of 95%-CI using dplyr
问题描述
我最后一次询问如何计算每个测量场合(一周)的变量(procras)的平均得分,这个变量对于多个受访者反复测量。所以我的(简化的)长格式数据集看起来像下面这样(这里有两个学生,5个时间点,没有分组变量):
studentID week procras
1 0 1.4
1 6 1.2
1 16 1.6
1 28 NA
1 40 3.8
2 0 1.4
2 6 1.8
2 16 2.0
2 28 2.5
2 40 2.8
使用dplyr我会得到每个度量场合的平均分数
mean_data < - group_by(DataRlong,week )%>%汇总(procras = mean(procras,na.rm = TRUE))
例如:
来源:local data frame [5 x 2]
occ procras
(dbl )(dbl)
1 0 1.993141
2 6 2.124020
3 16 2.251548
4 28 2.469658
5 40 2.617903
使用ggplot2我现在可以绘制随时间的平均变化,并且通过轻松调整dplyr的group_data(),我也可以获得每个子组的意味着例如,男性和女性每次平均得分)。
现在我想在mean_data表中添加一列,其中包括95%-CIs每个场合平均得分的长度。
您可以使用 mutate
在中总结一些额外的函数
library(dplyr)
mtcars%>%
group_by(vs)%>%
汇总(mean.mpg =平均值(mpg,na.rm = TRUE),
sd.mpg = sd (mpg,na.rm = TRUE),
n.mpg = n())%>%
mutate(se.mpg = sd.mpg / sqrt(n.mpg),
lower.ci.mpg = mean.mpg-qt(1-(0.05 / 2),n.mpg-1)* se.mpg,
upper.ci.mpg = mean.mpg + qt(1 - (0.05 / 2),n.mpg - 1)* se.mpg)
#>来源:本地数据框[2 x 7]
#>
#> vs mean.mpg sd.mpg n.mpg se.mpg lower.ci.mpg upper.ci.mpg
#> (dbl)(dbl)(dbl)(int)(dbl)(dbl)(dbl)
#> 1 0 16.61667 3.860699 18 0.9099756 14.69679 18.53655
#> 2 1 24.55714 5.378978 14 1.4375924 21.45141 27.66287
Last time I asked how it was possible to calculate the average score per measurement occasion (week) for a variable (procras) that has been measured repeatedly for multiple respondents. So my (simplified) dataset in long format looks for example like the following (here two students, and 5 time points, no grouping variable):
studentID week procras
1 0 1.4
1 6 1.2
1 16 1.6
1 28 NA
1 40 3.8
2 0 1.4
2 6 1.8
2 16 2.0
2 28 2.5
2 40 2.8
Using dplyr I would get the average score per measurement occasion
mean_data <- group_by(DataRlong, week)%>% summarise(procras = mean(procras, na.rm = TRUE))
Looking like this e.g.:
Source: local data frame [5 x 2]
occ procras
(dbl) (dbl)
1 0 1.993141
2 6 2.124020
3 16 2.251548
4 28 2.469658
5 40 2.617903
With ggplot2 I could now plot the average change over time, and by easily adjusting the group_data() of dplyr I could also get means per sub groups (for instance, the average score per occasion for men and women). Now I would like to add a column to the mean_data table which includes the length for the 95%-CIs around the average score per occasion.
http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_(ggplot2)/ explains how to get and plot CIs, but this approach seems to become problematic as soon as I wanted to do this for any subgroup, right? So is there a way to let dplyr also include the CI (based on group size, ect.) automatically in the mean_data? After that it should be fairly easy to plot the new values as CIs into the graphs I hope. Thank you.
You could do it manually using mutate
a few extra functions in summarise
library(dplyr)
mtcars %>%
group_by(vs) %>%
summarise(mean.mpg = mean(mpg, na.rm = TRUE),
sd.mpg = sd(mpg, na.rm = TRUE),
n.mpg = n()) %>%
mutate(se.mpg = sd.mpg / sqrt(n.mpg),
lower.ci.mpg = mean.mpg - qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg,
upper.ci.mpg = mean.mpg + qt(1 - (0.05 / 2), n.mpg - 1) * se.mpg)
#> Source: local data frame [2 x 7]
#>
#> vs mean.mpg sd.mpg n.mpg se.mpg lower.ci.mpg upper.ci.mpg
#> (dbl) (dbl) (dbl) (int) (dbl) (dbl) (dbl)
#> 1 0 16.61667 3.860699 18 0.9099756 14.69679 18.53655
#> 2 1 24.55714 5.378978 14 1.4375924 21.45141 27.66287
这篇关于用dplyr计算95%-CI的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!