对数据帧中的许多列应用t检验，按因子分割 [英] Apply t-test on many columns in a dataframe split by factor

查看：93 发布时间：2017/3/25 22:58:44 r dataframe plyr

本文介绍了对数据帧中的许多列应用t检验，按因子分割的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框，其中包含两个级别的一个因素列，以及许多数字列。我想通过因子列拆分数据框，并对colunm对进行t检验。

使用示例数据集Puromycin希望结果如下所示：

 变量处理未处理的p值测试统计值差异**** 
浓度0.3450 0.2763 XXX T XX  -  XX 
价格141.58 110.7272 xxx T XX  -  XX

我想我正在寻找一种使用PLYR的解决方案，可以在一个很好的数据框中输出上述结果。

 
 
 （Puromycin仅包含两个数字变量，但我正在寻找的解决方案可以在具有许多数值变量的数据框上工作）
 
 
  更新 - 我会尝试澄清我的意思。
 
 
 我想从这样的数据：
 分组变量var1 var2 var3 var4 var5 
 1 3 5 7 3 7 
 1 3 7 5 9 6 
 1 5 2 6 7 6 
 1 9 5 7 0 8 
 1 2 4 5 7 8 
 1 2 3 1 6 4 
 2 4 2 7 6 5 
 2 0 8 3 7 5 
 2 1 2 3 5 9 
 2 1 5 3 8 0 
 2 2 6 9 0 7 
 2 3 6 7 8 8 
 2 10 6 3 8 0 
  
对于如下结果数据框：
 组1中的意思组2中的意思差异的P值N
 
 var1 ## ## ## 
 var2 ## ## ## ## 
 var3 ## ## ## 
 var4 ## ## ## 
 var5 ## ## ## 
  
也许这是我正在寻找的东西，因为我想把我的数据框分解成dataframe1和dataframe2，级别因素，并适用对数据帧1和数据帧2的第一部分的函数（t检验），然后对数据帧1和数据帧2的第二部分进行t检验，然后对数据帧1和数据帧2的第三部分进行t检验，等等所有通过因子分割生成的列对。
解决方案
也许这会产生您要查找的结果：
  df<  -  read.table（text =组var1 var2 var3 var4 var5 
 1 3 5 7 3 7 
 1 3 7 5 9 6 
 1 5 2 6 7 6 
 1 9 5 7 0 8 
 1 2 4 5 7 8 
 1 2 3 1 6 4 
 2 4 2 7 6 5 
 2 0 8 3 7 5 
 2 1 2 3 5 9 
 2 1 5 3 8 0 
 2 2 6 9 0 7 
 2 3 6 7 8 8 
 2 10 6 3 8 0，header = TRUE）
 
 
t（sapply（df [-1]，function（x）
 unlist（t.test（x〜df $ Group）[c（估计，p.value，统计，conf.int）]）））
  
结果：
  1在第2组中的估计值p.value statistic.t conf.int1 conf.int2 
 var1 4.000000 3.000000 0.5635410 0.5955919 -2.696975 4.696975 
 var2 4.333333 5.000000 0.5592911 -0.6022411 -3.104788 1.771454 
 var3 5.166667 5.000000 0.9028444 0.1249164 -2.770103 3.103436 
 var4 5.333333 6.000000 0.7067827 -0.3869530 -4.497927 3.164593 
 var5 6.500000 4.857143 0.3053172 1.0925986 -1.803808 5.089522 
  
 
I have a dataframe with one factor column with two levels, and many numeric columns. I want to split the dataframe by the factor column and do t-test on the colunm pairs. 

Using the example dataset Puromycin I want the result to look something like this:
Variable    Treated Untreated   p-value    Test-statistic CI of difference**** 
Conc        0.3450  0.2763          XXX     T           XX - XX
Rate        141.58  110.7272        xxx     T           XX - XX
I think I am looking for a solution using PLYR that can an output the above results in a nice dataframe.

(The Puromycin only contains two numeric variables, but the solution I am looking for would work on a dataframe with many numeric variables)

UPDATE - I will try to clarify what i mean.

I would like to go from data that look like this:
Grouping variable   var1    var2    var3    var4    var5
1           3   5   7   3   7
1           3   7   5   9   6
1           5   2   6   7   6
1           9   5   7   0   8
1           2   4   5   7   8
1           2   3   1   6   4
2           4   2   7   6   5
2           0   8   3   7   5
2           1   2   3   5   9
2           1   5   3   8   0
2           2   6   9   0   7
2           3   6   7   8   8
2           10  6   3   8   0
To a result dataframe that look like this:
"Mean in group 1"   "Mean in group 2"  "P-value of difference" "N"

var1            ##          ##          ##          ##      
var2            ##          ##          ##          ##  
var3            ##          ##          ##          ##  
var4            ##          ##          ##          ##  
var5            ##          ##          ##          ##
Maybe it is something with mapply I am looking for because i want to split up my dataframe into dataframe1 and dataframe2 by a two-level factor, and apply a function( t-test) to the first parts of dataframe1 and dataframe2, and then a t-test on the second parts of dataframe1 and dataframe2, and then a t-test to the third parts of dataframe1 and dataframe2, and so on on all the column-pairs generated by the split by factor.
 解决方案 
Maybe this produces the result you are looking for:
df <- read.table(text="Group   var1    var2    var3    var4    var5
1           3   5   7   3   7
1           3   7   5   9   6
1           5   2   6   7   6
1           9   5   7   0   8
1           2   4   5   7   8
1           2   3   1   6   4
2           4   2   7   6   5
2           0   8   3   7   5
2           1   2   3   5   9
2           1   5   3   8   0
2           2   6   9   0   7
2           3   6   7   8   8
2           10  6   3   8   0", header = TRUE)


t(sapply(df[-1], function(x) 
     unlist(t.test(x~df$Group)[c("estimate","p.value","statistic","conf.int")])))
The result:
     estimate.mean in group 1 estimate.mean in group 2   p.value statistic.t conf.int1 conf.int2
var1                 4.000000                 3.000000 0.5635410   0.5955919 -2.696975  4.696975
var2                 4.333333                 5.000000 0.5592911  -0.6022411 -3.104788  1.771454
var3                 5.166667                 5.000000 0.9028444   0.1249164 -2.770103  3.103436
var4                 5.333333                 6.000000 0.7067827  -0.3869530 -4.497927  3.164593
var5                 6.500000                 4.857143 0.3053172   1.0925986 -1.803808  5.089522


                        
这篇关于对数据帧中的许多列应用t检验，按因子分割的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

对数据帧中的许多列应用t检验，按因子分割 [英] Apply t-test on many columns in a dataframe split by factor

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

对数据帧中的许多列应用t检验，按因子分割 [英] Apply t-test on many columns in a dataframe split by factor

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭