在R中的多个子集上正确使用Anova和sapply [英] Correct use of sapply with Anova on multiple subsets in R

查看:234
本文介绍了在R中的多个子集上正确使用Anova和sapply的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对数据帧的多个子集运行双向ANOVA,而不必实际对数据进行子集化,因为这效率不高

I am trying to run a two-way ANOVA on multiple subsets of a data frame without having to actually subset the data as this is in-efficient

示例数据:

DF<-structure(list(Sample = c(666L, 676L, 686L, 667L, 677L, 687L, 
822L, 832L, 842L, 824L, 834L, 844L), Time = c(300L, 300L, 300L, 
300L, 300L, 300L, 400L, 400L, 400L, 400L, 400L, 400L), Ploidy = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2n", 
"3n"), class = "factor"), Tissue = c("muscle", "muscle", "muscle", 
"liver", "liver", "liver", "intestine", "intestine", "intestine", 
"gill", "gill", "gill"), X.lipid = c(1.1, 0.8, 1.3, 3.7, 3.9, 
3.8, 5.2, 3.4, 6, 7.6, 10.4, 6.7), l.dec = c(0.011, 0.008, 0.013, 
0.037, 0.039, 0.038, 0.052, 0.034, 0.06, 0.076, 0.104, 0.067), 
l.arc = c(0.105074124512229, 0.0895624074394449, 0.114266036973812, 
0.193560218793138, 0.19879088899975, 0.196192082631721, 0.230059118691331, 
0.185452088760136, 0.247467063170448, 0.279298057669285, 
0.328359182374352, 0.261824790465914)), .Names = c("Sample", 
"Time", "Ploidy", "Tissue", "X.lipid", "l.dec", "l.arc"), row.names = c(1L, 
2L, 3L, 4L, 5L, 6L, 69L, 70L, 71L, 72L, 73L, 74L), class = "data.frame")

遇到类似的例子: Anova,用于循环以应用函数对多个响应的方差分析,由多个组不组成公式

我可以接近,但我不认为这是正确的,因为它使用aov而不是anova

I can get close but I do not believe this is correct as it uses aov, as opposed to anova

x<- unique(DF$Tissue)

sapply(x, function(my) {
f <- as.formula(paste("l.dec~Time*Ploidy"))
aov(f, data=DF)
}, simplify=FALSE)

如果我将aov切换为anova,则会返回错误消息:

If i switch aov for anova, it returns an error message:

 Error in UseMethod("anova") : 
 no applicable method for 'anova' applied to an object of class "formula" 

很远但是正确的是:

#Subset by each Tissue type (just one here for e.g.)
muscle<- subset (DF, Tissue == "muscle")
#Perform Anova
anova(lm(l.dec ~ Ploidy * Time, data = muscle))

但是,在主数据帧中,我有许多组织类型,并希望避免执行此子集.

However In the main data frame I have many tissue types and want to avoid performing this subset.

我认为申请方法很接近,但是在最后阶段需要帮助.

I believe the apply formula is close but need help on the final stages.

推荐答案

在@ user20650和我上面的评论的基础上,我建议首先将sapplylm一起使用以生成模型列表,然后再使用在该列表上再次以生成ANOVA表.这样一来,您就可以使用模型列表,从而可以获取系数,拟合值,残差等.

Building on @user20650 and my comments above, I would suggest first using sapply with lm to generate your list of models, and then use sapply again on that list to generate your ANOVA tables. That way the list of models will be available to you so you can get coefficients, fitted values, residuals etc etc.

x <- unique(DF$Tissue)

models <- sapply(x, function(my) {
  lm(l.dec ~ Time * Ploidy, data=DF, Tissue==my)
}, simplify=FALSE)

ANOVA.tables <- sapply(models, anova, simplify=FALSE)

这篇关于在R中的多个子集上正确使用Anova和sapply的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆