R-用lme4分析重复测量不平衡设计? [英] R- analyzing repeated measures unbalanced design with lme4?

查看:81
本文介绍了R-用lme4分析重复测量不平衡设计?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的实验中,我修剪了植物并测量了它们的反应,例如在季节结束时产生的叶子质量.我操纵了剪裁强度和剪裁时间,并交叉了这两种处理方式.我还包括一个控制剪裁处理,导致 5 种不同的剪裁处理组合.每次处理 12 株植物,我在两年内跟踪了总共 60 株植物.也就是说,我在第 1 年收集了这 60 株植物的测量值,并在第 2 年再次收集了相同的植物.

For my experiment, I clipped plants and measured their responses, such as leaf mass produced, at the end of the season. I manipulated both clipping intensity and clipping time and crossed these two treatments. I also included a control clipped treatment resulting in 5 different clipping treatment combinations. With 12 plants per treatment there is a total of 60 plants which I followed over the course of two years. That is, I collected measurements on these 60 plants in year 1 and the same plants again in year 2.

最简单的方法是分别分析 5 种不同的处理方法.然而,我想获得时间和强度的影响及其相互作用,但由于控制处理没有与时间或强度完全交叉,这使得我的实验设计不平衡且统计上棘手.更复杂的是,我想将年份的影响也包括到我的模型中.

It would be simplest to just analyze the 5 different treatments separately. However, I would like to obtain the effects of timing and intensity and their interactions, but because of the control treatment which is not fully crossed with either timing or intensity, this makes my experimental design unbalanced and statistically tricky. To complicate this a bit more, I would like to include the effect of year into my model as well.

理想情况下,我可以使用 lme4 来做到这一点,这使得使用 lsmeans 包之后的多重比较变得轻而易举.

Ideally, I would be able to do this using lme4 which makes multiple comparison a breeze afterwards with the lsmeans package.

当我尝试运行我的模型时

When I try to run my model

     m1<-lmer(log(plant.leaf.g+1)~timing*intensity*year+(1|id), data=cmv) #not significant

我遇到了警告固定效应模型矩阵是秩不足的,因此删除了 8 列/系数".

I am met with the warning "fixed-effect model matrix is rank deficient so dropping 8 columns / coefficients".

有谁知道我可以让这个不平衡的混合模型与 lme4 一起工作的方法吗?

Does anyone know of a way I can get this unbalanced mixed model to work with lme4?

这是我的数据的一个子集,其中从不"在计时和零"在强度下任意替换了控制"处理:

Here is a subset of my data where "never" under timing and "zero" under intensity arbitrarily replaced "control" treatment:

id  year    timing  intensity   treatment   plant.leaf.g
91  2015    early   low early-low   315.944
92  2015    never   zero    control 99.28
93  2015    late    high    late-high   663.936
94  2015    early   low early-low   25.488
95  2015    early   high    early-high  453.57
96  2015    late    low late-low    90.804
97  2015    never   zero    control 1312.098
98  2015    late    high    late-high   959.82
99  2015    late    low late-low    28.014
100 2015    late    high    late-high   178.56
91  2014    early   low early-low   289.14
92  2014    never   zero    control 61.774
93  2014    late    high    late-high   639.936
94  2014    early   low early-low   138.39
95  2014    early   high    early-high  168.216
96  2014    late    low late-low    51.008
97  2014    never   zero    control 966.112
98  2014    late    high    late-high   279.048
99  2014    late    low late-low    23.936
100 2014    late    high    late-high   169.344

cmv<-structure(list(id = c(91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 
99L, 100L, 101L, 102L, 103L, 105L, 106L, 107L, 108L, 109L, 110L, 
91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 
103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L), year = c(2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
2014L, 2014L), timing = structure(c(1L, 3L, 2L, 1L, 1L, 2L, 3L, 
2L, 2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 1L, 3L, 2L, 1L, 3L, 2L, 1L, 
1L, 2L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 3L, 1L, 1L, 3L, 2L
), .Label = c("early", "late", "never"), class = "factor"), intensity =     structure(c(2L, 
3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 3L, 2L, 1L, 
3L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 
2L, 3L, 2L, 1L, 3L, 1L), .Label = c("high", "low", "zero"), class = "factor"), 
treatment = structure(c(3L, 1L, 4L, 3L, 2L, 5L, 1L, 4L, 5L, 
4L, 5L, 2L, 2L, 5L, 1L, 3L, 2L, 1L, 4L, 3L, 1L, 4L, 3L, 2L, 
5L, 1L, 4L, 5L, 4L, 5L, 2L, 2L, 5L, 5L, 1L, 3L, 2L, 1L, 4L
), .Label = c("control", "early-high", "early-low", "late-high", 
"late-low"), class = "factor"), plant.stem.g = c(315.944, 
99.28, 663.936, 25.488, 453.57, 90.804, 1312.098, 959.82, 
28.014, 178.56, 158.12, 387.528, 288.75, 327.348, 770.44, 
835.05, 457.188, 942.002, 229.194, 289.14, 61.774, 639.936, 
138.39, 168.216, 51.008, 966.112, 279.048, 23.936, 169.344, 
154.14, 703.04, 836.4, 511.92, 463.524, 245.226, 267.41, 
439.392, 714.85, 68.012)), .Names = c("id", "year", "timing", 
"intensity", "treatment", "plant.stem.g"), class = "data.frame", row.names =     c(NA, 
-39L))

注意:我已经让 m1=aov(plant.leaf.g~intensity*timing*year+Error(id), data=cmv) 运行,但我读到我应该使用car 包中的 Anova type="3" 函数来获取我的 p 值,但我无法使用 Error(id) 项来做到这一点.我也无法与 TukeyHSD 函数或 multcomp 包进行多重比较.

Note: I have gotten m1=aov(plant.leaf.g~intensity*timing*year+Error(id), data=cmv) to run, but I read that I should use Anova type="3" function from the car package to obtain my p-values, but I haven't been able to do this with the Error(id) term. Nor have I been able do a multiple comparison with TukeyHSD function or multcomp package.

推荐答案

 m1<-lmer(log(plant.leaf.g+1)~timing*intensity*year+(1|id), 
          data=cmv)

(除了其中包含零的对数转换数据很棘手;您确定加 1 是正确的吗?只有在叶质量无单位时才有意义.您可以考虑添加 min(plant.leaf.g[plant.leaf.g>0])/2 代替...)

(except that log-transforming data with zeros in it is tricky; are you sure that adding 1 is correct? It only makes perfect sense if leaf mass is unitless. You might consider adding min(plant.leaf.g[plant.leaf.g>0])/2 instead ...)

出现警告(不是错误)是因为您的数据集中没有时间、强度和年份的所有组合,但您要求 R 估计每个组合的参数.一些合理的选择是:

The warning (not an error) occurs because you don't have all combinations of timing, intensity, and year in your data set, but you are asking R to estimate parameters for every combination. A few reasonable choices are:

  • 忽略警告(在比较每个因素的整体影响时,您可能无论如何都会得到合理的答案)
  • 降低模型的复杂性,特别是通过消除 3 向交互(即使用 (timing+intensity+year)^2)(我假设这会起作用,但是您如果您的数据中缺少时间和强度的组合,则可能需要进一步简化模型)
  • 从三向交互构建单向方差分析,例如cmv$int <- with(cmv,interaction(timing,intensity,year,drop=TRUE))(但是你将无法分离主效应和交互)
  • ignore the warning (you'll probably get reasonable answers anyway when comparing the overall effect of each factor)
  • reduce the complexity of the model, in particular by eliminating the 3-way interaction (i.e. use (timing+intensity+year)^2) (I'm assuming this will work, but you might need to simplify the model still further if e.g. there are combinations of timing and intensity that are missing from your data)
  • construct a one-way ANOVA from the 3-way interaction, e.g. cmv$int <- with(cmv,interaction(timing,intensity,year,drop=TRUE)) (but then you won't be able to separate main effects and interactions)

这篇关于R-用lme4分析重复测量不平衡设计?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆