R中的方差分解和比较(正交单个df) [英] partition of anova and comparisons (orthogonal single df) in r

查看:584
本文介绍了R中的方差分解和比较(正交单个df)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在方差分析(固定或混合模型)中做单个df正交对比.这只是示例:

I want to do single df orthogonal contrast in anova (fixed or mixed model). Here is just example:

require(nlme)
data (Alfalfa)
  Variety: a factor with levels Cossack, Ladak, and Ranger
  Date : a factor with levels None S1 S20 O7
  Block: a factor with levels 1 2 3 4 5 6
  Yield : a numeric vector

这些数据在Snedecor和Cochran(1980)中进行了描述 图的设计.实验中使用的处理结构 是3 \ times4全阶乘,有三个苜蓿品种和四个 1943年第三次切割的日期.实验单位被安排 分为六个区块,每个区块又分为四个地块.苜蓿的品种 (Cossac,Ladak和Ranger)被随机分配给这些区块, 第三次切割的日期(无,S1-9月1日,S20-9月20日, 和O7-10月7日)被随机分配到该地块. 所有四个日期都用在每个块上.

These data are described in Snedecor and Cochran (1980) as an example of a split-plot design. The treatment structure used in the experiment was a 3\times4 full factorial, with three varieties of alfalfa and four dates of third cutting in 1943. The experimental units were arranged into six blocks, each subdivided into four plots. The varieties of alfalfa (Cossac, Ladak, and Ranger) were assigned randomly to the blocks and the dates of third cutting (None, S1—September 1, S20—September 20, and O7—October 7) were randomly assigned to the plots. All four dates were used on each block.

model<-with (Alfalfa, aov(Yield~Variety*Date +Error(Block/Date/Variety)))

    > summary(model)

Error: Block
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  5   4.15    0.83

Error: Block:Date
          Df Sum Sq Mean Sq F value   Pr(>F)
Date       3 1.9625  0.6542   17.84 3.29e-05 ***
Residuals 15 0.5501  0.0367
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Block:Date:Variety
             Df Sum Sq Mean Sq F value Pr(>F)
Variety       2 0.1780 0.08901   1.719  0.192
Variety:Date  6 0.2106 0.03509   0.678  0.668
Residuals    40 2.0708 0.05177

我想进行一些比较(一组内的正交对比),例如日期,两个对比:

I want to perform some comparison (orthogonal contrasts within a group), for example for date, two contrasts:

   (a) S1 vs others (S20 O7)
   (b) S20 vs 07,

对于变异因子2的对比:

For variety factor two contrasts:

  (c)  Cossack vs others (Ladak and Ranger)
   (d) Ladak vs Ranger

因此,方差分析的输出如下所示:

Thus the anova output would look like:

Error: Block
              Df Sum Sq Mean Sq F value Pr(>F)
Residuals  5   4.15    0.83

Error: Block:Date
          Df Sum Sq Mean Sq F value   Pr(>F)
Date       3 1.9625  0.6542   17.84 3.29e-05 ***
       (a) S1 vs others    ?        ?
       (b)  S20 vs 07       ?        ?
Residuals 15 0.5501  0.0367
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Error: Block:Date:Variety
             Df Sum Sq Mean Sq F value Pr(>F)
Variety       2 0.1780 0.08901   1.719  0.192
     (c)  Cossack vs others ?        ?    ?
     (d)  Ladak vs Ranger    ?       ?     ?
Variety:Date  6 0.2106 0.03509   0.678  0.668
Residuals    40 2.0708 0.05177

我该如何执行呢? ....................

How can I perform this ? ....................

推荐答案

首先,为什么要使用ANOVA?您可以使用nlme包中的lme,除了aov给您的假设检验之外,您还可以获得效果大小和效果方向的可解释的估计.无论如何,我想到了两种方法:

First of all, why use ANOVA? You can use lme from the nlme package and in addition to the hypothesis tests aov gives you, you also get interpretable estimates of the effect sizes and the directions of the effects. At any rate, two approaches come to mind:

  • 按照
  • Specify contrasts on the variables manually, as explained here.
  • Install the multcomp package and use glht.

glht对预测变量为多变量的模型有些怀疑.长话短说,但是,如果要创建与模型的vcov具有相同尺寸和暗名的对角矩阵cm0(假设它是称为model0lme拟合),则summary(glht(model0,linfct=cm0))应该给出与summary(model0)$tTable相同的估计值,SE和测试统计信息(但不正确 p值).现在,如果您弄乱了来自cm0的行的线性组合,并创建了与cm0相同列数的新矩阵,但是这些线性组合作为行,那么您最终将找出创建矩阵的模式,该矩阵将给您每个单元格的截距估计值(对照predict(model0,level=0)进行检查).现在,另一个在此矩阵的各行之间具有差异的矩阵将为您提供相应的组间差异.可以使用相同的方法,但将数值设置为1而不是0,以获取每个像元的斜率估计值.然后,可以利用这些斜率估计值之间的差异来获得组间斜率差异.

glht is a little opinionated about models that are multivariate in their predictors. Long story short, though, if you were to create a diagonal matrix cm0 with the same dimensions and dimnames as the vcov of your model (let's assume it's an lme fit called model0), then summary(glht(model0,linfct=cm0)) should give the same estimates, SEs, and test statistics as summary(model0)$tTable (but incorrect p-values). Now, if you mess around with linear combinations of rows from cm0 and create new matrices with the same number of columns as cm0 but these linear combinations as rows, you'll eventually figure out the pattern to creating a matrix that will give you the intercept estimate for each cell (check it against predict(model0,level=0)). Now, another matrix with differences between various rows of this matrix will give you corresponding between-group differences. The same approach but with numeric values set to 1 instead of 0 can be used to get the slope estimates for each cell. Then the differences between these slope estimates can be used to get between-group slope differences.

要记住的三件事:

  • 正如我所说,对于lm以外的模型(可能没有尝试)和某些生存模型,p值将是错误的.这是因为默认情况下glht假定z分发而不是t分发(lm除外).要获取正确的p值,请使用检验统计量glht计算并手动执行2*pt(abs( STAT ),df= DF ,lower=F)以获得两尾p -value,其中 STAT glht返回的测试统计信息, DF summary(model0)$tTable中相应默认对比度类型的df.
  • 您的对比可能不再测试独立的假设,并且如果没有,则必须进行多次测试校正.通过p.adjust运行p值.
  • 这是我自己对教授和同事的大量挥霍,以及对相关主题的Crossvalidated和Stackoverflow的大量阅读的总结.我可能会以多种方式犯错,如果我是错的话,希望有更多知识的人会纠正我们两个人.
  • As I said the p-values are going to be wrong for models other than lm, (possibly, haven't tried) aov, and certain survival models. This is because glht assumes a z distribution instead of a t distribution by default (except for lm). To get correct p-values, take the test statistic glht calculates and manually do 2*pt(abs(STAT),df=DF,lower=F) to get the two-tailed p-value where STAT is the test statistic returned by glht and DF is the df from the corresponding type of default contrast in summary(model0)$tTable.
  • Your contrasts probably no longer test independent hypotheses, and multiple testing correction is necessary, if it wasn't already. Run the p-values through p.adjust.
  • This is my own distillation of a lot of handwaving from professors and colleagues, and a lot of reading of Crossvalidated and Stackoverflow on related topics. I could be wrong in multiple ways, and if I am, hopefully someone more knowlegeable will correct us both.

这篇关于R中的方差分解和比较(正交单个df)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆