对数据帧中的许多列应用t检验,按因子分割 [英] Apply t-test on many columns in a dataframe split by factor
问题描述
使用示例数据集Puromycin希望结果如下所示:
变量处理未处理的p值测试统计值差异****
浓度0.3450 0.2763 XXX T XX - XX
价格141.58 110.7272 xxx T XX - XX
我想我正在寻找一种使用PLYR的解决方案,可以在一个很好的数据框中输出上述结果。
(Puromycin仅包含两个数字变量,但我正在寻找的解决方案可以在具有许多数值变量的数据框上工作)
更新 - 我会尝试澄清我的意思。
我想从这样的数据:
分组变量var1 var2 var3 var4 var5
1 3 5 7 3 7
1 3 7 5 9 6
1 5 2 6 7 6
1 9 5 7 0 8
1 2 4 5 7 8
1 2 3 1 6 4
2 4 2 7 6 5
2 0 8 3 7 5
2 1 2 3 5 9
2 1 5 3 8 0
2 2 6 9 0 7
2 3 6 7 8 8
2 10 6 3 8 0
对于如下结果数据框:
组1中的意思组2中的意思差异的P值N
var1 ## ## ##
var2 ## ## ## ##
var3 ## ## ##
var4 ## ## ##
var5 ## ## ##
也许这是我正在寻找的东西,因为我想把我的数据框分解成dataframe1和dataframe2,级别因素,并适用对数据帧1和数据帧2的第一部分的函数(t检验),然后对数据帧1和数据帧2的第二部分进行t检验,然后对数据帧1和数据帧2的第三部分进行t检验,等等所有通过因子分割生成的列对。
也许这会产生您要查找的结果:
df< - read.table(text =组var1 var2 var3 var4 var5
1 3 5 7 3 7
1 3 7 5 9 6
1 5 2 6 7 6
1 9 5 7 0 8
1 2 4 5 7 8
1 2 3 1 6 4
2 4 2 7 6 5
2 0 8 3 7 5
2 1 2 3 5 9
2 1 5 3 8 0
2 2 6 9 0 7
2 3 6 7 8 8
2 10 6 3 8 0,header = TRUE)
t(sapply(df [-1],function(x)
unlist(t.test(x〜df $ Group)[c(估计,p.value,统计,conf.int)])))
结果:
1在第2组中的估计值p.value statistic.t conf.int1 conf.int2
var1 4.000000 3.000000 0.5635410 0.5955919 -2.696975 4.696975
var2 4.333333 5.000000 0.5592911 -0.6022411 -3.104788 1.771454
var3 5.166667 5.000000 0.9028444 0.1249164 -2.770103 3.103436
var4 5.333333 6.000000 0.7067827 -0.3869530 -4.497927 3.164593
var5 6.500000 4.857143 0.3053172 1.0925986 -1.803808 5.089522
I have a dataframe with one factor column with two levels, and many numeric columns. I want to split the dataframe by the factor column and do t-test on the colunm pairs.
Using the example dataset Puromycin I want the result to look something like this:
Variable Treated Untreated p-value Test-statistic CI of difference****
Conc 0.3450 0.2763 XXX T XX - XX
Rate 141.58 110.7272 xxx T XX - XX
I think I am looking for a solution using PLYR that can an output the above results in a nice dataframe.
(The Puromycin only contains two numeric variables, but the solution I am looking for would work on a dataframe with many numeric variables)
UPDATE - I will try to clarify what i mean.
I would like to go from data that look like this:
Grouping variable var1 var2 var3 var4 var5
1 3 5 7 3 7
1 3 7 5 9 6
1 5 2 6 7 6
1 9 5 7 0 8
1 2 4 5 7 8
1 2 3 1 6 4
2 4 2 7 6 5
2 0 8 3 7 5
2 1 2 3 5 9
2 1 5 3 8 0
2 2 6 9 0 7
2 3 6 7 8 8
2 10 6 3 8 0
To a result dataframe that look like this:
"Mean in group 1" "Mean in group 2" "P-value of difference" "N"
var1 ## ## ## ##
var2 ## ## ## ##
var3 ## ## ## ##
var4 ## ## ## ##
var5 ## ## ## ##
Maybe it is something with mapply I am looking for because i want to split up my dataframe into dataframe1 and dataframe2 by a two-level factor, and apply a function( t-test) to the first parts of dataframe1 and dataframe2, and then a t-test on the second parts of dataframe1 and dataframe2, and then a t-test to the third parts of dataframe1 and dataframe2, and so on on all the column-pairs generated by the split by factor.
Maybe this produces the result you are looking for:
df <- read.table(text="Group var1 var2 var3 var4 var5
1 3 5 7 3 7
1 3 7 5 9 6
1 5 2 6 7 6
1 9 5 7 0 8
1 2 4 5 7 8
1 2 3 1 6 4
2 4 2 7 6 5
2 0 8 3 7 5
2 1 2 3 5 9
2 1 5 3 8 0
2 2 6 9 0 7
2 3 6 7 8 8
2 10 6 3 8 0", header = TRUE)
t(sapply(df[-1], function(x)
unlist(t.test(x~df$Group)[c("estimate","p.value","statistic","conf.int")])))
The result:
estimate.mean in group 1 estimate.mean in group 2 p.value statistic.t conf.int1 conf.int2
var1 4.000000 3.000000 0.5635410 0.5955919 -2.696975 4.696975
var2 4.333333 5.000000 0.5592911 -0.6022411 -3.104788 1.771454
var3 5.166667 5.000000 0.9028444 0.1249164 -2.770103 3.103436
var4 5.333333 6.000000 0.7067827 -0.3869530 -4.497927 3.164593
var5 6.500000 4.857143 0.3053172 1.0925986 -1.803808 5.089522
这篇关于对数据帧中的许多列应用t检验,按因子分割的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!