在dplyr summarise()中按组计算上下置信区间 [英] Calculating upper and lower confidence intervals by group in dplyr summarise()
问题描述
我正在尝试创建一个表格,该表格显示 N (观察次数),频率百分比(答案> 0)以及百分比频率的上下置信区间,我想按类型分组。
I am trying to make a table that shows N (number of observations), percent frequency (of answers > 0), and the lower and upper confidence intervals for percent frequency, and I want to group this by type.
数据示例
dat <- data.frame(
"type" = c("B","B","A","B","A","A","B","A","A","B","A","A","A","B","B","B"),
"num" = c(3,0,0,9,6,0,4,1,1,5,6,1,3,0,0,0)
)
预期输出(使用值填充):
Type N Percent Lower 95% CI Upper 95% CI
A
B
尝试
library(dplyr)
library(qwraps2)
table<-dat %>%
group_by(type) %>%
summarise(N=n(),
mean.ci = mean_ci(dat$num),
"Percent"=n_perc(num > 0))
这可以得到N和百分比频率,但返回错误:当我在mean_ci中添加时,列的长度必须为1(一个汇总值),而不是3
This worked to get N and percent frequency, but returned an error: "Column must be length 1 (a summary value), not 3" when I added in mean_ci
我尝试的第二个代码找到了此处:
The second code I tried, found here:
table2<-dat %>%
group_by(type) %>%
summarise(N.num=n(),
mean.num = mean(dat$num),
sd.num = sd(dat$num),
"Percent"=n_perc(num > 0)) %>%
mutate(se.num = sd.num / sqrt(N.num),
lower.ci = 100*(mean.num - qt(1 - (0.05 / 2), N.num - 1) * se.num),
upper.ci = 100*(mean.num + qt(1 - (0.05 / 2), N.num - 1) * se.num))
# A tibble: 2 x 8
# type N.num mean.num sd.num Percent se.num lower.ci upper.ci
# <fct> <int> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
#1 A 8 2.44 2.83 "6 (75.00\\%)" 1.00 7.35 480.
#2 B 8 2.44 2.83 "4 (50.00\\%)" 1.00 7.35 480.
这给了我一个输出,但是置信区间是不合逻辑的。
This gave me an output, but the confidence intervals are not logical.
推荐答案
mean_ci
的输出是长度为3的向量。这可能是意料之外的,因为该软件包添加了打印方法,因此当您在控制台中看到此方法时,它看起来像一个字符值,而不是数字长度> 1个向量。但是,您可以通过查看 str
来查看基础数据结构。
The output of mean_ci
is a vector of length 3. This is maybe unexpected because the package has added a print method so that when you see this in the console it looks like a single character value and not a numeric length > 1 vector. But, you can see the underlying data structure by looking at str
.
mean_ci(dat$num) %>% str
# 'qwraps2_mean_ci' Named num [1:3] 2.44 1.05 3.82
# - attr(*, "names")= chr [1:3] "mean" "lcl" "ucl"
# - attr(*, "alpha")= num 0.05
总而言之,输出的每一列的每个元素都必须为长度1,因此为摘要提供一个长度为3的对象以放入单个单元格(列元素)会导致错误。一种解决方法是将长度为3的向量放入列表中,以便现在为长度为1的列表。然后,您可以使用 unnest_wider
将其分为3列(并因此使表更宽)
In summarize, each element of each column of the output needs to be length 1, so providing a length 3 object for summarize to put in a single "cell" (column element) results in an error. A workaround is to put the length 3 vector in a list, so that it is now a length 1 list. Then you can use unnest_wider
to separate it into 3 columns (and therefore making the table "wider")
library(tidyverse)
dat %>%
group_by(type) %>%
summarise( N=n(),
mean.ci = list(mean_ci(num)),
"Percent"= n_perc(num > 0)) %>%
unnest_wider(mean.ci)
# # A tibble: 2 x 6
# type N mean lcl ucl Percent
# <fct> <int> <dbl> <dbl> <dbl> <chr>
# 1 A 8 2.25 0.523 3.98 "6 (75.00\\%)"
# 2 B 8 2.62 0.344 4.91 "4 (50.00\\%)"
这篇关于在dplyr summarise()中按组计算上下置信区间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!