dplyr使用t.test汇总多列 [英] dplyr summarise multiple columns using t.test
问题描述
是否可以通过某种方式针对同一个类别变量对多个变量进行t.test,而无需进行如下所示的数据集重塑?
Is it possible somehow to do a t.test over multiple variables against the same categorical variable without going through a reshaping of the dataset as follows?
data(mtcars)
library(dplyr)
library(tidyr)
j <- mtcars %>% gather(var, val, disp:qsec)
t <- j %>% group_by(var) %>% do(te = t.test(val ~ vs, data = .))
t %>% summarise(p = te$p.value)
我尝试使用
mtcars%>%summarise_each_(funs =(t.test(。〜vs))$ p.value,vars = disp:qsec)
mtcars %>% summarise_each_(funs = (t.test(. ~ vs))$p.value, vars = disp:qsec)
,但会引发错误。
奖金:如何 t%> %summarise(p = te $ p.value)
还包括分组变量的名称吗?
Bonus: How can t %>% summarise(p = te$p.value)
also include the name of the grouping variable?
推荐答案
与@aosmith和@Misha进行所有讨论之后,这是一种方法。正如@aosmith在他/她的评论中所写,您想执行以下操作。
After all discussions with @aosmith and @Misha, here is one approach. As @aosmith wrote in his/her comments, You want to do the following.
mtcars %>%
summarise_each(funs(t.test(.[vs == 0], .[vs == 1])$p.value), vars = disp:qsec)
# vars1 vars2 vars3 vars4 vars5
#1 2.476526e-06 1.819806e-06 0.01285342 0.0007281397 3.522404e-06
vs是0或1(组)。如果您想在变量的两个组之间进行t检验(例如dips),似乎需要按照@aosmith的建议对数据进行子集化。我想对您的贡献表示感谢。
vs is either 0 or 1 (group). If you want to run a t-test between the two groups in a variable (e.g., dips), it seems that you need to subset data as @aosmith suggested. I would like to say thank you for the contribution.
我最初建议的方法在另一种情况下有效,您只需比较两列即可。这是示例数据和代码。
What I originally suggested works in another situation, in which you simply compare two columns. Here is sample data and codes.
foo <- data.frame(country = "Iceland",
year = 2014,
id = 1:30,
A = sample.int(1e5, 30, replace = TRUE),
B = sample.int(1e5, 30, replace = TRUE),
C = sample.int(1e5, 30, replace = TRUE),
stringsAsFactors = FALSE)
如果要对AC和BC组合运行t检验,则以下方法是一种。
If you want to run t-tests for the A-C, and B-C combination, the following would be one way.
foo2 <- foo %>%
summarise_each(funs(t.test(., C, pair = TRUE)$p.value), vars = A:B)
names(foo2) <- colnames(foo[4:5])
# A B
#1 0.2937979 0.5316822
这篇关于dplyr使用t.test汇总多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!