更适合线性模型 [英] Better fits for a linear model

查看:89
本文介绍了更适合线性模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在拟合一些线条,感觉就像我在告诉R确切如何拟合它们,但是我感觉有些东西(某些因素或影响)我并不知道这妨碍了拟合.

I am fitting some lines and I feel like I am telling R exactly how to fit them, but I feel like there is something (some factor or effect) I am unaware of that is preventing a good fit.

我的实验单位是绘图",就像田间绘图一样,对此我感到困惑.

My experimental unit is "plot" as in field plot, which I am sorry is confusing.

可以找到数据: https://www.dropbox.com/s /a0tplyvs8lxu1d0/rootmeansv2.csv . 与

df$plot.f<-as.factor(df$plot)
dfG<-groupedData(mass ~ year|plot.f, data=df)
dfG30<-dfG[dfG$depth == 30,]

简单来说,我随时间推移具有质量,因此我将该模型拟合到每个实验单元中:

Simply, I have mass over time and I fit it to each experimental unit with the model:

fit <- lme(mass ~ year , random = ~ 1 | plot, data = df)

并使用plot (augPred(fit))得到每个实验单位的拟合值(图"):

and with plot (augPred(fit)) I get these fits for each experimental unit ("plot"):

我该怎么做才能使实验单位之间的斜率更多地变化?从统计的角度来看,我对此并不感兴趣,但从预测的角度来看,我可以对模型中的任何内容进行操作以使这些线移动.

What do I need to do to let the slope vary more between experimental units? I am not interested in this from a statistical perspective, but from a predictive perspective -- so anything in the model can be manipulated to get those lines to move.

推荐答案

这个答案将有点百科全书,因为关于您的问题有几个重要点. tl; dr 是将year*plot作为随机效果添加的第一步,但是拟合实际上有点问题,尽管起初并不如此:将year变量居中问题,但是对数据进行日志转换可能是一个更好的主意.

This answer is going to be a bit encyclopedic, because there are several important points about your question. tl;dr adding year*plot as a random effect is the first step, but the fit is actually a bit problematic, although it doesn't appear so at first: centering the year variable takes care of the problem, but log-transforming the data might be an even better idea.

更新:OP正在对subset(df,Depth==30)进行分析.我无意中对整个数据集进行了分析,这可能使事情变得更加艰难-下文记录的异方差性对于数据的子集可能并不那么糟糕-但出于教学目的,我将其保留下来(而且出于懒惰).

UPDATE: OP was doing an analysis on subset(df,Depth==30). I accidentally did the analysis on the full data set, which may have made things harder -- the heteroscedasticity documented below probably isn't as bad for a subset of the data -- but I'm going to leave it as is for pedagogical purposes (and out of laziness).

获取数据:

df <- read.csv("rootmeansv2.csv")
library(nlme)
gdf <- groupedData( mass ~ year | plot, data=df)

将逐年交互添加到模型中作为随机效应:

Adding the year-by-plot interaction to the model as a random effect:

fit0 <- lme(mass ~ year , random = ~ year | plot, data = gdf)

但是,如果我们看一下结果,就会发现这根本无济于事-year项(斜率中的方差)很小.

However, if we look at the results we see that this doesn't help very much at all -- the year term (among-plot variance in the slope) is tiny.

##             Variance     StdDev       Corr  
## (Intercept) 9.522880e-12 3.085916e-06 (Intr)
## year        7.110616e-08 2.666574e-04 0.32  
## Residual    3.885373e-01 6.233276e-01       

有时确实会发生这种情况(单数拟合),但是否则lme摘要不会以任何其他方式表明可能有问题.但是,如果我们尝试获得置信区间,我们会发现可能会有麻烦:

This does sometimes happen (a singular fit), but otherwise the lme summary doesn't indicate in any other way that there might be something wrong. However, if we try to get confidence intervals we see there might be trouble:

 intervals(fit0)
 ##   Error in intervals.lme(fit0) : 
 ##   cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance

另一种重复检查的方法是在lme4中安装相同的模型.

The other way to double-check is to fit the same model in lme4.

library(lme4)
VarCorr(fit1 <- lmer(mass ~ year +(year | plot), data = gdf))

##  Groups   Name        Std.Dev.   Corr  
##  plot     (Intercept) 0.66159674       
##           year        0.00059471 -1.000
##  Residual             0.62324403       
## Warning messages:
## 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
##   Model failed to converge with max|grad| = 0.132739 (tol = 0.002, component 1)
## 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
##   Model failed to converge: degenerate  Hessian with 1 negative eigenvalues

我们显然得到了截然不同的答案和关于收敛的警告(这是在开发版本1.1-7中,它没有遭受广泛报道的针对1.1-6的错误肯定警告).看来lme4的合身性稍好一些:

We get apparently quite different answers, and warnings about convergence (this is in the development version, 1.1-7, which does not suffer from the false-positive warnings reported widely for 1.1-6). It looks like lme4's fit is slightly better:

c(logLik(fit1)-logLik(fit0))
## 0.1775161

复杂拟合(例如混合模型)的潜在数据问题之一是预测变量的居中和缩放不佳.在这种情况下,由于year是从2008年开始的连续预测变量,所以截距和斜率高度相关(strong> extremely )(截距是0年的预测值!).在这种情况下,我们可以通过居中来解决问题-减去最小值(即从0开始而不是从2008年开始)也很合理

One of the potential data problems for complex fits such as mixed models is poor centering and scaling of predictor variables. In this case, because year is a continuous predictor that starts at 2008, the intercept and slopes are extremely highly correlated (the intercept is the predicted value in year 0!). We can fix the problem in this case by centering -- it would also be reasonable to subtract the minimum (i.e. start year at 0 rather than 2008)

c. <- function(x) scale(x,center=TRUE,scale=FALSE)
VarCorr(fit2 <- update(fit1,.~ c.(year) +(c.(year) | plot)))

##  Groups   Name        Std.Dev. Corr 
##  plot     (Intercept) 0.53798       
##           c.(year)    0.10515  1.000
##  Residual             0.59634       

我们得到了更合理的答案(没有警告),尽管我们仍然具有完美(现在是正相关)的截距和斜率-这仅表示该模型略微拟合了数据,但是对此我们无能为力(我们可以通过拟合模型~c.(year)+(c.(year)||plot))来将相关性强制为零,但这有其自身的问题).

We get more reasonable answers (and no warnings), although we do still have perfectly (now positively) correlated intercepts and slopes -- this just says the model is slightly overfitting the data, but there's not much we can do about this (we could force the correlation to zero by fitting the model ~c.(year)+(c.(year)||plot)), but this has its own problems).

现在在lme中尝试:

VarCorr(fit3 <- update(fit0,
                       fixed.=~c.(year),
                       random=~c.(year)|plot,
         control=lmeControl(opt="optim")))
## plot = pdLogChol(c.(year)) 
##             Variance   StdDev    Corr  
## (Intercept) 0.28899909 0.5375864 (Intr)
## c.(year)    0.01122497 0.1059479 0.991 
## Residual    0.35559015 0.5963138       

结果相似(尽管相关性仅为0.991,而不是1.0):对数似然率的差异实际上稍大,但仍然很小. (拟合仍然有些问题-我必须更改lmeControl参数中所示的优化程序才能到达此处.)

Results are similar (although correlation is only 0.991 instead of 1.0): the difference in log-likelihoods is actually slightly larger, but still small. (The fit is still somewhat problematic -- I had to change optimizers as shown in the lmeControl argument to get here.)

现在情况看起来更好了:

Things now look better:

plot(augPred(fit3))

但是,残差与拟合图显示了您应该担心的问题:

However, the residual vs fitted plot shows you a problem you should worry about:

plot(fit3)

漏斗形状显示您有异方差问题.比例位置图更加清晰地显示了这一点:

The funnel shape shows you have a heteroscedasticity problem. The scale-location plot shows it even more clearly:

plot(fit3,sqrt(abs(resid(.)))~fitted(.),type=c("p","smooth"))

最明显的解决方法是对数据进行日志转换:

The most obvious fix is to log-transform the data:

df$logmass <- log10(df$mass)  ## log10() for interpretability
gdfL <- groupedData( logmass ~ year | plot, data=df)
VarCorr(fitL1 <- lme(logmass ~ c.(year) , random = ~ c.(year) | plot,
             data = gdfL))
## plot = pdLogChol(c.(year)) 
##             Variance     StdDev     Corr  
## (Intercept) 0.1842905872 0.42929080 (Intr)
## c.(year)    0.0009702893 0.03114947 -0.607
## Residual    0.1052946948 0.32449144       

年度间的差异又很小,但这一次可能是因为不需要.当我们拟合等效模型时,没有来自lmer的警告,我们得到相同的结果;拟合是非奇异的(无零方差或完美的相关性);和intervals(fitL1)起作用.

The among-year variation is small again, but this time it's probably because it's not needed. There are no warnings from lmer when we fit the equivalent model, and we get identical results; fit is non-singular (no zero variances or perfect correlations); and intervals(fitL1) works.

预测看起来很合理:

plot(augPred(fitL1))

诊断图也是如此:

plot(fitL1)

尽管2008年确实有些有趣(轴被翻转是因为lme倾向于水平绘制箱线图-factor(year)告诉lme使用箱线图而不是散点图).

Although there does appear to be something a bit funny about 2008 (the axes are flipped because lme prefers to plot boxplots horizontally -- factor(year) tells lme to use a boxplot instead of a scatterplot).

plot(fitL1,factor(year)~resid(.))

这篇关于更适合线性模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆