具有汇总函数和tidyverse的均值之间的差异的置信区间和p值 [英] Confidence interval and p.values for difference between means with summarize function and tidyverse
问题描述
我试图弄清楚如何将数据帧从长到宽,同时按两个变量(菱形切割以及钻石df的颜色D和F)进行分组,并同时总结数据的一些关键特征。
I am trying to figure out how to turn a data frame from long to wide, while grouping by two variables (diamond cut and colors D and F from diamonds df) and summarizing some key features of the data at the same time.
具体来说,我试图获得两种均值之间的差异,即95%CI和围绕该差异的p值。
Specifically, I am trying to get the difference between two means, 95% CI and p-values around that difference.
这里是我想要的输出表的示例(红色是我要完成的工作)。
Here is an example of my desired output table (in red is what I am trying to accomplish).
下面的示例代码,显示了多远我已经得到:
Sample code below, showing how far I've gotten:
library(tidyverse)
# Build summary data
diamonds <- diamonds %>%
select(cut, depth, color) %>%
filter(color == "F" | color == "D") %>%
group_by(cut, color) %>%
summarise(mean = mean(depth), #calculate mean & CIs
lower_ci = mean(depth) - qt(1- 0.05/2, (n() - 1))*sd(depth)/sqrt(n()),
upper_ci = mean(depth) + qt(1- 0.05/2, (n() - 1))*sd(depth)/sqrt(n()))
# Turn table from long to wide
diamonds <- dcast(as.data.table(diamonds), cut ~ color, value.var = c("mean", "lower_ci", "upper_ci"))
# Rename & calculate the mean difference
diamonds <- diamonds %>%
rename(
Cut = cut,
Mean.Depth.D = mean_D,
Mean.Depth.F = mean_F,
Lower.CI.Depth.D = lower_ci_D,
Lower.CI.Depth.F = lower_ci_F,
Upper.CI.Depth.D = upper_ci_D,
Upper.CI.Depth.F = upper_ci_F) %>%
mutate(Mean.Difference = Mean.Depth.D - Mean.Depth.F)
# Re-organize the table
diamonds <- subset(diamonds, select = c(Cut:Mean.Depth.F, Mean.Difference, Lower.CI.Depth.D:Upper.CI.Depth.F))
#Calculate the CIs (upper and lower) and p.values for mean difference for each cut and insert them into the table.
?
我想我应该计算CI,而p值表示颜色D和D之间的深度差F在我总结之前的某个时候,但不确定如何做到。
I think I am supposed to calculate the CIs and p-values mean difference in depth between colors D and F at some point before I summarize, but not exactly sure how.
感谢您的输入。
推荐答案
获取均值的比较(使用t检验)针对 cut
不同值的D和F颜色,这是您需要做的:
To get comparisons of means (with t-tests) for D and F colours across different values for cut
, this is what you would need to do:
library(broom)
diamonds %>%
filter(color %in% c("D", "F")) %>%
group_by(cut) %>%
do( tidy(t.test(data=., depth~color)))
这篇关于具有汇总函数和tidyverse的均值之间的差异的置信区间和p值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!