多重 t 检验比较 [英] Multiple t-test comparisons

查看:27
本文介绍了多重 t 检验比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何使用 t.testpairwise.t.test 在基因组合之间进行多重比较.首先,我如何比较基因 1 与基因 3、基因 3 与基因 4 等的所有组合?其次,我如何才能只比较基因 1 与其他基因的组合?

I would like to know how I can use t.test or pairwise.t.test to make multiple comparisons between gene combinations. First, how can I compare all combinations Gene 1 vs. Gene 3, Gene 3 vs Gene 4, etc.? Second, how would I be able to only compare combinations of Gene 1 with the other genes?

我需要为此创建一个函数吗?

Do I need to make a function for this?

假设我有下面的数据集,当参数长度不同"时,我该怎么办?

Assuming I have the dataset below, when "arguments are not the same length", what can I do?

谢谢.

Gene   S1      S2      S3      S4      S5      S6     S7
1   20000   12032   23948    2794    5870     782    699
3   15051   17543   18590   21005   22996   26448
4   35023   43092   41858   39637   40933   38865

推荐答案

我认为 @akrun 在编程方面有很好的答案可以帮助解决这个问题,但是由于这个问题也与统计有关,所以提一下似乎很重要使用多个 t 检验可能不被认为是一种统计上可靠的分析方法,这取决于完整数据集中的比较次数.所以请记住这一点.至少,建议在此处应用 Bonferroni 校正或类似方法.所以我已将其添加到@akrun 的代码中.

I think that @akrun has a great answer to help on the programming side of this, but since this question is also related to statistics, it seems important to mention that using multiple t-tests may not be considered a statistically sound method of analysis, depending on the number of comparisons in your full dataset. So please keep that in mind. At the very least, applying a Bonferroni correction, or similar, would be recommended here. So I've added that to @akrun's code.

在运行 t 检验之前,最好运行方差分析以查看总体上是否存在任何差异.哥伦比亚大学在他们的统计页面上对此方法进行了有用的解释.

Prior to running the t-tests, it may also be best to run an ANOVA to see if there are any differences overall. Columbia University has a helpful explanation of this approach on their stats page.

话虽如此,为了回答问题的编程方面,我将向您展示如何同时进行,但对于那些查找同一问题的人,请在使用此答案之前仔细检查您的方法.

That said, I'll show you how to do both for the sake of answering the programming aspect of the question, but for those looking up the same question, please carefully review your methods before using this answer.

为了那些不太熟悉它的人,我通过 R 中的 options(scipen=999) 显示了以下结果,没有科学记数法.

I've displayed the following results without scientific notation for the benefit of those less familiar with it, via options(scipen=999) in R.

预测试方差分析:

summary(aov(val ~ as.factor(Gene), data=gather(df, key, val, -Gene)))

                Df     Sum Sq    Mean Sq F value     Pr(>F)    
as.factor(Gene)  2 2627772989 1313886494   34.49 0.00000245 ***
Residuals       15  571374752   38091650                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

T 检验:

library(broom)
library(dplyr)
library(tidyr)

gather(df, key, val, -Gene) %>% 
  do(data.frame(tidy(pairwise.t.test(.$val, .$Gene, p.adjust="bonferroni"))))

  group1 group2       p.value
1      3      1 0.05691493022
2      4      1 0.00000209244
4      4      3 0.00018020669

对于这些测试,观察的长度是否完全相同并不重要.我上面概述的代码仍将运行.但是,在 R 中使空白或空值等于 NA 通常是一种很好的做法.请参阅此SO 答案,了解将值更改为 NA 的方法.

For these tests, it doesn't particularly matter if the length of the observations are not exactly the same. The code I've outlined above will still run. However, it's generally good practice in R to make blank or null values equal NA. See this SO answer for a way to change values to NA.

如果您想将 t 检验限制为仅几个基因比较,例如,基因 1 与基因 3 和基因 1 与基因 4,但不是基因 3 与基因 4,最简单的方法是仍然使用上面的代码.然而,不要在 pairwise.t.test 函数内应用 p 值校正,只需将其应用在您想要评估的 p 值上.试试这个:

If you'd like to limit your t-tests to only a few gene comparisons, for example, gene 1 vs. gene 3 and gene 1 vs. gene 4, but not gene 3 vs gene 4, the simplest way is to still use the code above. Instead of applying p-value correction inside the pairwise.t.test function, however, just apply it afterword on only the p-values you want to assess. Try this:

res <- gather(df, key, val, -Gene) %>% 
  do(data.frame(tidy(pairwise.t.test(.$val, .$Gene))))

res <- res[res$group1==1 | res$group2 ==1,]

res$p.value <-  p.adjust(res$p.value, method = "bonferroni")

print(res)

  group1 group2        p.value
1      3      1 0.015989134399
2      4      1 0.000001458475

请注意,以上仅对我们有子集并想要评估的测试应用 p 值校正,在此示例中,这是涉及基因 1 的任何组合,不包括不涉及基因 1 的组合.

Note that the above is only applying p-value correction on the tests that we've subset and want to asses, which for this example is any combination that involves gene 1, excluding combinations not involving gene 1.

这篇关于多重 t 检验比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆