在 R 中的多个子集上正确使用 sapply 和 Anova [英] Correct use of sapply with Anova on multiple subsets in R

查看:34
本文介绍了在 R 中的多个子集上正确使用 sapply 和 Anova的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对数据帧的多个子集运行双向方差分析,而不必实际对数据进行子集化,因为这是低效的

I am trying to run a two-way ANOVA on multiple subsets of a data frame without having to actually subset the data as this is in-efficient

示例数据:

DF<-structure(list(Sample = c(666L, 676L, 686L, 667L, 677L, 687L, 
822L, 832L, 842L, 824L, 834L, 844L), Time = c(300L, 300L, 300L, 
300L, 300L, 300L, 400L, 400L, 400L, 400L, 400L, 400L), Ploidy = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2n", 
"3n"), class = "factor"), Tissue = c("muscle", "muscle", "muscle", 
"liver", "liver", "liver", "intestine", "intestine", "intestine", 
"gill", "gill", "gill"), X.lipid = c(1.1, 0.8, 1.3, 3.7, 3.9, 
3.8, 5.2, 3.4, 6, 7.6, 10.4, 6.7), l.dec = c(0.011, 0.008, 0.013, 
0.037, 0.039, 0.038, 0.052, 0.034, 0.06, 0.076, 0.104, 0.067), 
l.arc = c(0.105074124512229, 0.0895624074394449, 0.114266036973812, 
0.193560218793138, 0.19879088899975, 0.196192082631721, 0.230059118691331, 
0.185452088760136, 0.247467063170448, 0.279298057669285, 
0.328359182374352, 0.261824790465914)), .Names = c("Sample", 
"Time", "Ploidy", "Tissue", "X.lipid", "l.dec", "l.arc"), row.names = c(1L, 
2L, 3L, 4L, 5L, 6L, 69L, 70L, 71L, 72L, 73L, 74L), class = "data.frame")

遇到类似的例子:Anova, for loop to apply function多个响应的方差分析,按多个组不部分公式的

我可以接近,但我不相信这是正确的,因为它使用的是 aov,而不是 anova

I can get close but I do not believe this is correct as it uses aov, as opposed to anova

x<- unique(DF$Tissue)

sapply(x, function(my) {
f <- as.formula(paste("l.dec~Time*Ploidy"))
aov(f, data=DF)
}, simplify=FALSE)

如果我将 aov 切换为 anova,它会返回一条错误消息:

If i switch aov for anova, it returns an error message:

 Error in UseMethod("anova") : 
 no applicable method for 'anova' applied to an object of class "formula" 

绕远了但正确的是如下:

Long way around but which is CORRECT is as follows:

#Subset by each Tissue type (just one here for e.g.)
muscle<- subset (DF, Tissue == "muscle")
#Perform Anova
anova(lm(l.dec ~ Ploidy * Time, data = muscle))

然而,在主数据框中,我有许多组织类型,并希望避免执行此子集.

However In the main data frame I have many tissue types and want to avoid performing this subset.

我相信应用公式很接近,但在最后阶段需要帮助.

I believe the apply formula is close but need help on the final stages.

推荐答案

基于@user20650 和我上面的评论,我建议首先使用 sapplylm 来生成您的模型列表,然后在该列表上再次使用 sapply 来生成您的方差分析表.这样,您就可以使用模型列表,以便您获得系数、拟合值、残差等.

Building on @user20650 and my comments above, I would suggest first using sapply with lm to generate your list of models, and then use sapply again on that list to generate your ANOVA tables. That way the list of models will be available to you so you can get coefficients, fitted values, residuals etc etc.

x <- unique(DF$Tissue)

models <- sapply(x, function(my) {
  lm(l.dec ~ Time * Ploidy, data=DF, Tissue==my)
}, simplify=FALSE)

ANOVA.tables <- sapply(models, anova, simplify=FALSE)

这篇关于在 R 中的多个子集上正确使用 sapply 和 Anova的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆