R中的方差分解和比较(正交单个df) [英] partition of anova and comparisons (orthogonal single df) in r
问题描述
我想在方差分析(固定或混合模型)中做单个df正交对比.这只是示例:
I want to do single df orthogonal contrast in anova (fixed or mixed model). Here is just example:
require(nlme)
data (Alfalfa)
Variety: a factor with levels Cossack, Ladak, and Ranger
Date : a factor with levels None S1 S20 O7
Block: a factor with levels 1 2 3 4 5 6
Yield : a numeric vector
这些数据在Snedecor和Cochran(1980)中进行了描述 图的设计.实验中使用的处理结构 是3 \ times4全阶乘,有三个苜蓿品种和四个 1943年第三次切割的日期.实验单位被安排 分为六个区块,每个区块又分为四个地块.苜蓿的品种 (Cossac,Ladak和Ranger)被随机分配给这些区块, 第三次切割的日期(无,S1-9月1日,S20-9月20日, 和O7-10月7日)被随机分配到该地块. 所有四个日期都用在每个块上.
These data are described in Snedecor and Cochran (1980) as an example of a split-plot design. The treatment structure used in the experiment was a 3\times4 full factorial, with three varieties of alfalfa and four dates of third cutting in 1943. The experimental units were arranged into six blocks, each subdivided into four plots. The varieties of alfalfa (Cossac, Ladak, and Ranger) were assigned randomly to the blocks and the dates of third cutting (None, S1—September 1, S20—September 20, and O7—October 7) were randomly assigned to the plots. All four dates were used on each block.
model<-with (Alfalfa, aov(Yield~Variety*Date +Error(Block/Date/Variety)))
> summary(model)
Error: Block
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 5 4.15 0.83
Error: Block:Date
Df Sum Sq Mean Sq F value Pr(>F)
Date 3 1.9625 0.6542 17.84 3.29e-05 ***
Residuals 15 0.5501 0.0367
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Block:Date:Variety
Df Sum Sq Mean Sq F value Pr(>F)
Variety 2 0.1780 0.08901 1.719 0.192
Variety:Date 6 0.2106 0.03509 0.678 0.668
Residuals 40 2.0708 0.05177
我想进行一些比较(一组内的正交对比),例如日期,两个对比:
I want to perform some comparison (orthogonal contrasts within a group), for example for date, two contrasts:
(a) S1 vs others (S20 O7)
(b) S20 vs 07,
对于变异因子2的对比:
For variety factor two contrasts:
(c) Cossack vs others (Ladak and Ranger)
(d) Ladak vs Ranger
因此,方差分析的输出如下所示:
Thus the anova output would look like:
Error: Block
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 5 4.15 0.83
Error: Block:Date
Df Sum Sq Mean Sq F value Pr(>F)
Date 3 1.9625 0.6542 17.84 3.29e-05 ***
(a) S1 vs others ? ?
(b) S20 vs 07 ? ?
Residuals 15 0.5501 0.0367
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Error: Block:Date:Variety
Df Sum Sq Mean Sq F value Pr(>F)
Variety 2 0.1780 0.08901 1.719 0.192
(c) Cossack vs others ? ? ?
(d) Ladak vs Ranger ? ? ?
Variety:Date 6 0.2106 0.03509 0.678 0.668
Residuals 40 2.0708 0.05177
我该如何执行呢? ....................
How can I perform this ? ....................
推荐答案
首先,为什么要使用ANOVA?您可以使用nlme
包中的lme
,除了aov
给您的假设检验之外,您还可以获得效果大小和效果方向的可解释的估计.无论如何,我想到了两种方法:
First of all, why use ANOVA? You can use lme
from the nlme
package and in addition to the hypothesis tests aov
gives you, you also get interpretable estimates of the effect sizes and the directions of the effects. At any rate, two approaches come to mind:
- Specify contrasts on the variables manually, as explained here.
- Install the
multcomp
package and useglht
.
glht
对预测变量为多变量的模型有些怀疑.长话短说,但是,如果要创建与模型的vcov
具有相同尺寸和暗名的对角矩阵cm0
(假设它是称为model0
的lme
拟合),则summary(glht(model0,linfct=cm0))
应该给出与summary(model0)$tTable
相同的估计值,SE和测试统计信息(但不正确 p值).现在,如果您弄乱了来自cm0
的行的线性组合,并创建了与cm0
相同列数的新矩阵,但是这些线性组合作为行,那么您最终将找出创建矩阵的模式,该矩阵将给您每个单元格的截距估计值(对照predict(model0,level=0)
进行检查).现在,另一个在此矩阵的各行之间具有差异的矩阵将为您提供相应的组间差异.可以使用相同的方法,但将数值设置为1而不是0,以获取每个像元的斜率估计值.然后,可以利用这些斜率估计值之间的差异来获得组间斜率差异.
glht
is a little opinionated about models that are multivariate in their predictors. Long story short, though, if you were to create a diagonal matrix cm0
with the same dimensions and dimnames as the vcov
of your model (let's assume it's an lme
fit called model0
), then summary(glht(model0,linfct=cm0))
should give the same estimates, SEs, and test statistics as summary(model0)$tTable
(but incorrect p-values). Now, if you mess around with linear combinations of rows from cm0
and create new matrices with the same number of columns as cm0
but these linear combinations as rows, you'll eventually figure out the pattern to creating a matrix that will give you the intercept estimate for each cell (check it against predict(model0,level=0)
). Now, another matrix with differences between various rows of this matrix will give you corresponding between-group differences. The same approach but with numeric values set to 1 instead of 0 can be used to get the slope estimates for each cell. Then the differences between these slope estimates can be used to get between-group slope differences.
要记住的三件事:
- 正如我所说,对于
lm
以外的模型(可能没有尝试)和某些生存模型,p值将是错误的.这是因为默认情况下glht
假定z
分发而不是t
分发(lm
除外).要获取正确的p值,请使用检验统计量glht
计算并手动执行2*pt(abs(
STAT),df=
DF,lower=F)
以获得两尾p -value,其中 STAT 是glht
返回的测试统计信息, DF 是summary(model0)$tTable
中相应默认对比度类型的df
. - 您的对比可能不再测试独立的假设,并且如果没有,则必须进行多次测试校正.通过
p.adjust
运行p值. - 这是我自己对教授和同事的大量挥霍,以及对相关主题的Crossvalidated和Stackoverflow的大量阅读的总结.我可能会以多种方式犯错,如果我是错的话,希望有更多知识的人会纠正我们两个人.
- As I said the p-values are going to be wrong for models other than
lm
, (possibly, haven't tried)aov
, and certain survival models. This is becauseglht
assumes az
distribution instead of at
distribution by default (except forlm
). To get correct p-values, take the test statisticglht
calculates and manually do2*pt(abs(
STAT),df=
DF,lower=F)
to get the two-tailed p-value where STAT is the test statistic returned byglht
and DF is thedf
from the corresponding type of default contrast insummary(model0)$tTable
. - Your contrasts probably no longer test independent hypotheses, and multiple testing correction is necessary, if it wasn't already. Run the p-values through
p.adjust
. - This is my own distillation of a lot of handwaving from professors and colleagues, and a lot of reading of Crossvalidated and Stackoverflow on related topics. I could be wrong in multiple ways, and if I am, hopefully someone more knowlegeable will correct us both.
这篇关于R中的方差分解和比较(正交单个df)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!