多重 t 检验比较 [英] Multiple t-test comparisons
问题描述
我想知道如何使用 t.test
或 pairwise.t.test
在基因组合之间进行多重比较.首先,我如何比较基因 1 与基因 3、基因 3 与基因 4 等的所有组合?其次,我如何才能只比较基因 1 与其他基因的组合?
I would like to know how I can use t.test
or pairwise.t.test
to make multiple comparisons between gene combinations. First, how can I compare all combinations Gene 1 vs. Gene 3, Gene 3 vs Gene 4, etc.? Second, how would I be able to only compare combinations of Gene 1 with the other genes?
我需要为此创建一个函数吗?
Do I need to make a function for this?
假设我有下面的数据集,当参数长度不同"时,我该怎么办?
Assuming I have the dataset below, when "arguments are not the same length", what can I do?
谢谢.
Gene S1 S2 S3 S4 S5 S6 S7
1 20000 12032 23948 2794 5870 782 699
3 15051 17543 18590 21005 22996 26448
4 35023 43092 41858 39637 40933 38865
推荐答案
我认为 @akrun 在编程方面有很好的答案可以帮助解决这个问题,但是由于这个问题也与统计有关,所以提一下似乎很重要使用多个 t 检验可能不被认为是一种统计上可靠的分析方法,这取决于完整数据集中的比较次数.所以请记住这一点.至少,建议在此处应用 Bonferroni 校正或类似方法.所以我已将其添加到@akrun 的代码中.
I think that @akrun has a great answer to help on the programming side of this, but since this question is also related to statistics, it seems important to mention that using multiple t-tests may not be considered a statistically sound method of analysis, depending on the number of comparisons in your full dataset. So please keep that in mind. At the very least, applying a Bonferroni correction, or similar, would be recommended here. So I've added that to @akrun's code.
在运行 t 检验之前,最好运行方差分析以查看总体上是否存在任何差异.哥伦比亚大学在他们的统计页面上对此方法进行了有用的解释.
Prior to running the t-tests, it may also be best to run an ANOVA to see if there are any differences overall. Columbia University has a helpful explanation of this approach on their stats page.
话虽如此,为了回答问题的编程方面,我将向您展示如何同时进行,但对于那些查找同一问题的人,请在使用此答案之前仔细检查您的方法.
That said, I'll show you how to do both for the sake of answering the programming aspect of the question, but for those looking up the same question, please carefully review your methods before using this answer.
为了那些不太熟悉它的人,我通过 R 中的 options(scipen=999) 显示了以下结果,没有科学记数法.
I've displayed the following results without scientific notation for the benefit of those less familiar with it, via options(scipen=999) in R.
预测试方差分析:
summary(aov(val ~ as.factor(Gene), data=gather(df, key, val, -Gene)))
Df Sum Sq Mean Sq F value Pr(>F)
as.factor(Gene) 2 2627772989 1313886494 34.49 0.00000245 ***
Residuals 15 571374752 38091650
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
T 检验:
library(broom)
library(dplyr)
library(tidyr)
gather(df, key, val, -Gene) %>%
do(data.frame(tidy(pairwise.t.test(.$val, .$Gene, p.adjust="bonferroni"))))
group1 group2 p.value
1 3 1 0.05691493022
2 4 1 0.00000209244
4 4 3 0.00018020669
对于这些测试,观察的长度是否完全相同并不重要.我上面概述的代码仍将运行.但是,在 R 中使空白或空值等于 NA 通常是一种很好的做法.请参阅此SO 答案,了解将值更改为 NA 的方法.
For these tests, it doesn't particularly matter if the length of the observations are not exactly the same. The code I've outlined above will still run. However, it's generally good practice in R to make blank or null values equal NA. See this SO answer for a way to change values to NA.
如果您想将 t 检验限制为仅几个基因比较,例如,基因 1 与基因 3 和基因 1 与基因 4,但不是基因 3 与基因 4,最简单的方法是仍然使用上面的代码.然而,不要在 pairwise.t.test 函数内应用 p 值校正,只需将其应用在您想要评估的 p 值上.试试这个:
If you'd like to limit your t-tests to only a few gene comparisons, for example, gene 1 vs. gene 3 and gene 1 vs. gene 4, but not gene 3 vs gene 4, the simplest way is to still use the code above. Instead of applying p-value correction inside the pairwise.t.test function, however, just apply it afterword on only the p-values you want to assess. Try this:
res <- gather(df, key, val, -Gene) %>%
do(data.frame(tidy(pairwise.t.test(.$val, .$Gene))))
res <- res[res$group1==1 | res$group2 ==1,]
res$p.value <- p.adjust(res$p.value, method = "bonferroni")
print(res)
group1 group2 p.value
1 3 1 0.015989134399
2 4 1 0.000001458475
请注意,以上仅对我们有子集并想要评估的测试应用 p 值校正,在此示例中,这是涉及基因 1 的任何组合,不包括不涉及基因 1 的组合.
Note that the above is only applying p-value correction on the tests that we've subset and want to asses, which for this example is any combination that involves gene 1, excluding combinations not involving gene 1.
这篇关于多重 t 检验比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!