ddply内的R ttest给出错误“分组因子必须精确地具有2个级别". [英] R ttest inside ddply gives error "grouping factor must have exactly 2 levels"

查看:782
本文介绍了ddply内的R ttest给出错误“分组因子必须精确地具有2个级别".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含多个因素和两个表型的数据框

I have a dataframe with several factors and two phenotypes

freq sampleID status score snpsincluded
0.5 0001 case 100 all 
0.2 0001 case 30 all 
0.5 0002 control 110 all 
0.5 0003 case 100 del 
etc

我想进行t.test,比较每组相关因素的病例和对照.我尝试了以下方法:

I would like to do a t.test comparing cases and controls for each set of relevant factors. I have tried the following:

o2 <- ddply(df, c("freq","snpsincluded"), summarise, pval=t.test(score, status)$p.value)

但是它抱怨分组因子必须精确地具有2个级别"

but it complains that " grouping factor must have exactly 2 levels"

我没有缺失的值,NA和Ive检查过:

I have no missing values, NAs, and Ive checked:

levels(df$status)
[1] "case"    "control"

我想念一些愚蠢的东西吗? 谢谢!

Am I missing something stupid? Thanks!

推荐答案

您收到一个错误,因为,您获得了至少一个子组的,并且所有得分的唯一状态.

You get an error because , you get a for at least one sub-group , unique status value for all score's.

这会重现错误,所有分数的状态都是唯一的(等于1).

This reproduce the error, the status is unique (equal to 1) for all scores.

dx = read.table(text='   score status
1 1 1 
2 2 1 
3 3 1 ')

t.test(score ~ status, data = dx) 
Error in t.test.formula(score ~ status, data = dx) : 
  grouping factor must have exactly 2 levels

这可以解决问题,但会导致t.test出现另一个已知问题,您应该有足够的观察力(我认为> = 2):

this correct the problem but create another known problem with t.test, you should have enough observations( I think >= 2):

dx = read.table(text='   score status
1 1 1 
2 2 1 
3 3 2 ')

t.test(score ~ status, data = dx) 
Error in t.test.default(x = 1:2, y = 3L) : not enough 'y' observations

最后,这可以解决所有问题:

Finally this correct all the problems:

dx = read.table(text='   score status
1 1 1 
2 2 1 
3 3 2 
4 4 2')

t.test(score ~ status, data = dx) 

Welch Two Sample t-test

data:  score by status
t = -2.8284, df = 2, p-value = 0.1056
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -5.042435  1.042435
sample estimates:
mean in group 1 mean in group 2 
            1.5             3.5 

编辑,由于您没有提供可复制的示例,因此我没有给出解决方案就说明了问题.

EDIT I explain the problem without giving a solution, because you don't give a reproducible example.

一种解决方案是仅对良好的群体进行计算:

one solution is do computation only for good groups:

  ddply(df, c("freq","snpsincluded"), function(x)
      { 
       if(length(unique(x$status)==2)
         pval=t.test(score~status,data=x)$p.value
     })

这篇关于ddply内的R ttest给出错误“分组因子必须精确地具有2个级别".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆