分组因子超过两个级别的分组t检验 [英] grouped t-test with more than two levels in grouping factor
问题描述
我有以下数据框:
结构(列表(名称= c("BACKGROUND_VL_1_100_H","BACKGROUND_VL_1_100_G","BACKGROUND_VL_1_100_F","BACKGROUND_VL_1_100_E","BACKGROUND_VL_1_100_D","BACKGROUND_VL_1_100_C","BACKGROUND_VL_1_100_B","BACKGROUND_VL_1_100_A","BACKGROUND_VL_05_100_H","BACKGROUND_VL_05_100_G","BACKGROUND_VL_05_100_F","BACKGROUND_VL_05_100_E","BACKGROUND_VL_05_100_D","BACKGROUND_VL_05_100_C","BACKGROUND_VL_05_100_B","BACKGROUND_VL_05_100_A","BACKGROUND_VL_025_100_H","BACKGROUND_VL_025_100_G","BACKGROUND_VL_025_100_F","BACKGROUND_VL_025_100_E","BACKGROUND_VL_025_100_D","BACKGROUND_VL_025_100_C","BACKGROUND_VL_025_100_B","BACKGROUND_VL_025_100_A","BACKGROUND_VL_0125_100_F","BACKGROUND_VL_0125_100_E","BACKGROUND_VL_0125_100_D","BACKGROUND_VL_0125_100_C","BACKGROUND_VL_0125_100_B","BACKGROUND_VL_0125_100_A","BACKGROUND_NEHC_0125_100_A","BACKGROUND_NEHC_0125_100_B","BACKGROUND_NEHC_0125_100_C","BACKGROUND_NEHC_0125_100_D","BACKGROUND_NEHC_0125_100_E","BACKGROUND_NEHC_0125_100_F","BACKGROUND_NEHC_0125_100_G","BACKGROUND_NEHC_025_100_G","BACKGROUND_NEHC_025_100_F","BACKGROUND_NEHC_025_100_D","BACKGROUND_NEHC_025_100_C","BACKGROUND_NEHC_025_100_B","BACKGROUND_NEHC_025_100_A","BACKGROUND_NEHC_05_100_C","BACKGROUND_NEHC_05_100_H","BACKGROUND_NEHC_05_100_G","BACKGROUND_NEHC_05_100_F","BACKGROUND_NEHC_05_100_D","BACKGROUND_NEHC_05_100_C","BACKGROUND_NEHC_05_100_B","BACKGROUND_NEHC_05_100_A"),ID = c(24,23,22,21,20,19,18,17,24,23,22,21,20,19,18,17,24,23,22,21,20,19,18,17、14、13、12、11、10、9、7、6、5、4、3、2、1、21、20、19、18,17,16,15,23,22,21,20,19,18,17),Conc_factor = c(1,1,1,1,1,1,1,1,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.25,0.25、0.25、0.25、0.25、0.25、0.25、0.25、0.125、0.125、0.125,0.125、0.125、0.125、0.125、0.125、0.125、0.125、0.125、0.125,0.125、0.25、0.25、0.25、0.25、0.25、0.25、0.5、0.5、0.5、0.5,0.5,0.5,0.5,0.5),Peptide_factor = c("Background","Background",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景",背景","Background","Background","Background","Background"),serumer_factor = c("VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","VL","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC","NEHC"),稀释系数= c(100,100,100,100,100,100、100、100、100、100、100、100、100、100、100、100、100、100,100、100、100、100、100、100、100、100、100、100、100、100、100,100、100、100、100、100、100、100、100、100、100、100、100、100,100,100,100,100,100,100,100),mean_fluorescence = c(17399.95703125,17554.48828125、17206.38671875、17961.63671875、17531.802734375,18382.783203125、17886.12890625、17760.802734375、18121.12109375,18030.228515625、18016.548828125、17790.91015625、17892.90625,18479.763671875、17880.212890625、17876.267578125、17338.04296875,17497.556640625、17575.44140625、16903.13671875、17713.2109375,18043.900390625、17703.81640625、17848.75、16977.166015625、17366.0390625,16957.97265625、16449.564453125、16725.259765625、16712.982421875,19181.806640625、18695.166015625、18568.4453125、18718.474609375,18195.10546875、17979.955078125、17738.958984375、19387.955078125,19103.15625、18983.361328125、18790.640625、18412.255859375,18014.478515625、17973.759765625、19574.638671875、17291.458984375,18660.455078125、18704.978515625、17241.298828125、18838.076171875,17792.349609375)),row.names = c(NA,-51L),class = c("tbl_df","tbl","data.frame")、. Names = c("Name","ID","Conc_factor","Peptide_factor","serum_factor","dilution_factor","mean_fluorescence"))
我要做的是比较按 serum_factor
分组后的 mean_fluorescence
的均值.
为了更好地说明,如果我运行以下代码:
库(dplyr)backgound_dil100%>%group_by(浓度因子,血清因子)%>%汇总(means_mean_fluorescence =平均值(mean_fluorescence))
我将得到下表:
Conc_factor血清因子均值_平均值_荧光< dbl>< chr>< dbl>1 0.125 NEHC 18440.2 0.125 VL 16865.3 0.250 NEHC 18782.4 0.250 VL 17578.5 0.500 NEHC 18260.6 0.500 VL 18011.7 1.00 VL 17710.
对于每个 Conc_factor
,我想比较 NEHC
和 VL
的平均值,并查看平均值( means_mean_fluorescence
)在统计上是不同的:
如果我这样做:
库(扫帚)backgound_dil100%>%group_by(Conc_factor,血清因子)%>%do(整理(t.test(mean_fluorescence〜serum_factor,data =.)))
我将收到以下错误消息:
t.test.formula中的错误(均值荧光〜血清因子,数据=):分组因子必须恰好具有2个级别
这对我来说部分有意义,毕竟我在 Conc_factor
中有四个级别.但是,我在 serum_factor
中恰好有两个级别,这实际上是我要比较的.
有人知道将多重t.test应用于两个以上水平的分组因子的方法吗?
首先,您缺少的值如下所示:
表(backgound_dil100 $ serum_factor,backgound_dil100 $ Conc_factor)0.125 0.25 0.5 1NEHC 7 6 8 0VL 6 8 8 8
因此,将其删除.另外,按照Jimbou的建议,根据需要将 serum_factor
从 group_by()
中删除,以便对 t.test()
进行分组./p>
您将获得:
backgound_dil100 [-c(1:8),]%>%group_by(Conc_factor)%&%;%do(整理(t.test(平均值_荧光〜血清因子,数据=.)))
I have the following data frame:
structure(list(Name = c("BACKGROUND_VL_1_100_H", "BACKGROUND_VL_1_100_G",
"BACKGROUND_VL_1_100_F", "BACKGROUND_VL_1_100_E", "BACKGROUND_VL_1_100_D",
"BACKGROUND_VL_1_100_C", "BACKGROUND_VL_1_100_B", "BACKGROUND_VL_1_100_A",
"BACKGROUND_VL_05_100_H", "BACKGROUND_VL_05_100_G", "BACKGROUND_VL_05_100_F",
"BACKGROUND_VL_05_100_E", "BACKGROUND_VL_05_100_D", "BACKGROUND_VL_05_100_C",
"BACKGROUND_VL_05_100_B", "BACKGROUND_VL_05_100_A", "BACKGROUND_VL_025_100_H",
"BACKGROUND_VL_025_100_G", "BACKGROUND_VL_025_100_F", "BACKGROUND_VL_025_100_E",
"BACKGROUND_VL_025_100_D", "BACKGROUND_VL_025_100_C", "BACKGROUND_VL_025_100_B",
"BACKGROUND_VL_025_100_A", "BACKGROUND_VL_0125_100_F", "BACKGROUND_VL_0125_100_E",
"BACKGROUND_VL_0125_100_D", "BACKGROUND_VL_0125_100_C", "BACKGROUND_VL_0125_100_B",
"BACKGROUND_VL_0125_100_A", "BACKGROUND_NEHC_0125_100_A", "BACKGROUND_NEHC_0125_100_B",
"BACKGROUND_NEHC_0125_100_C", "BACKGROUND_NEHC_0125_100_D", "BACKGROUND_NEHC_0125_100_E",
"BACKGROUND_NEHC_0125_100_F", "BACKGROUND_NEHC_0125_100_G", "BACKGROUND_NEHC_025_100_G",
"BACKGROUND_NEHC_025_100_F", "BACKGROUND_NEHC_025_100_D", "BACKGROUND_NEHC_025_100_C",
"BACKGROUND_NEHC_025_100_B", "BACKGROUND_NEHC_025_100_A", "BACKGROUND_NEHC_05_100_C",
"BACKGROUND_NEHC_05_100_H", "BACKGROUND_NEHC_05_100_G", "BACKGROUND_NEHC_05_100_F",
"BACKGROUND_NEHC_05_100_D", "BACKGROUND_NEHC_05_100_C", "BACKGROUND_NEHC_05_100_B",
"BACKGROUND_NEHC_05_100_A"), ID = c(24, 23, 22, 21, 20, 19, 18,
17, 24, 23, 22, 21, 20, 19, 18, 17, 24, 23, 22, 21, 20, 19, 18,
17, 14, 13, 12, 11, 10, 9, 7, 6, 5, 4, 3, 2, 1, 21, 20, 19, 18,
17, 16, 15, 23, 22, 21, 20, 19, 18, 17), Conc_factor = c(1, 1,
1, 1, 1, 1, 1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.25,
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.125, 0.125, 0.125,
0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125,
0.125, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5), Peptide_factor = c("Background", "Background",
"Background", "Background", "Background", "Background", "Background",
"Background", "Background", "Background", "Background", "Background",
"Background", "Background", "Background", "Background", "Background",
"Background", "Background", "Background", "Background", "Background",
"Background", "Background", "Background", "Background", "Background",
"Background", "Background", "Background", "Background", "Background",
"Background", "Background", "Background", "Background", "Background",
"Background", "Background", "Background", "Background", "Background",
"Background", "Background", "Background", "Background", "Background",
"Background", "Background", "Background", "Background"), serum_factor = c("VL",
"VL", "VL", "VL", "VL", "VL", "VL", "VL", "VL", "VL", "VL", "VL",
"VL", "VL", "VL", "VL", "VL", "VL", "VL", "VL", "VL", "VL", "VL",
"VL", "VL", "VL", "VL", "VL", "VL", "VL", "NEHC", "NEHC", "NEHC",
"NEHC", "NEHC", "NEHC", "NEHC", "NEHC", "NEHC", "NEHC", "NEHC",
"NEHC", "NEHC", "NEHC", "NEHC", "NEHC", "NEHC", "NEHC", "NEHC",
"NEHC", "NEHC"), dilution_factor = c(100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100), mean_fluorescence = c(17399.95703125,
17554.48828125, 17206.38671875, 17961.63671875, 17531.802734375,
18382.783203125, 17886.12890625, 17760.802734375, 18121.12109375,
18030.228515625, 18016.548828125, 17790.91015625, 17892.90625,
18479.763671875, 17880.212890625, 17876.267578125, 17338.04296875,
17497.556640625, 17575.44140625, 16903.13671875, 17713.2109375,
18043.900390625, 17703.81640625, 17848.75, 16977.166015625, 17366.0390625,
16957.97265625, 16449.564453125, 16725.259765625, 16712.982421875,
19181.806640625, 18695.166015625, 18568.4453125, 18718.474609375,
18195.10546875, 17979.955078125, 17738.958984375, 19387.955078125,
19103.15625, 18983.361328125, 18790.640625, 18412.255859375,
18014.478515625, 17973.759765625, 19574.638671875, 17291.458984375,
18660.455078125, 18704.978515625, 17241.298828125, 18838.076171875,
17792.349609375)), row.names = c(NA, -51L), class = c("tbl_df",
"tbl", "data.frame"), .Names = c("Name", "ID", "Conc_factor",
"Peptide_factor", "serum_factor", "dilution_factor", "mean_fluorescence"
))
What I want to do is to compare the means of mean_fluorescence
after grouping by Conc_factor
and serum_factor
.
To better illustrate, if I run the following code:
library(dplyr)
backgound_dil100 %>% group_by(Conc_factor, serum_factor) %>% summarise(means_mean_fluorescence = mean(mean_fluorescence))
I will get the following table:
Conc_factor serum_factor means_mean_fluorescence
<dbl> <chr> <dbl>
1 0.125 NEHC 18440.
2 0.125 VL 16865.
3 0.250 NEHC 18782.
4 0.250 VL 17578.
5 0.500 NEHC 18260.
6 0.500 VL 18011.
7 1.00 VL 17710.
For each Conc_factor
I want to compare the means of NEHC
and VL
and see if the means (means_mean_fluorescence
) are statistically different:
If I do:
library(broom)
backgound_dil100 %>% group_by(Conc_factor, serum_factor) %>% do(tidy(t.test(mean_fluorescence~serum_factor, data = .)))
I will get the following error message:
Error in t.test.formula(mean_fluorescence ~ serum_factor, data = .) :
grouping factor must have exactly 2 levels
This partly makes sense to me, after all I have four levels in Conc_factor
. However, I have exactly two levels in serum_factor
and this is actually what I am trying to compare.
Does anyone know a way to apply this multiple t.test for grouping factor with more than two levels?
First of all, you have missing values as seen here:
table(backgound_dil100$serum_factor,backgound_dil100$Conc_factor)
0.125 0.25 0.5 1
NEHC 7 6 8 0
VL 6 8 8 8
Therefore, remove them. Also, as Jimbou adviced, remove the serum_factor
from group_by()
as you need it for the groupings of the t.test()
.
You will get:
backgound_dil100[-c(1:8),] %>%
group_by(Conc_factor) %>%
do(tidy(t.test(mean_fluorescence~serum_factor, data = .)))
这篇关于分组因子超过两个级别的分组t检验的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!