使用R中的渐变包为线性混合模型指定相关结构 [英] Specifying a correlation structure for a linear mixed model using the ramps package in R

查看:85
本文介绍了使用R中的渐变包为线性混合模型指定相关结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个线性混合模型(lmm),该模型允许在点之间进行空间相关(每个点都有经度/纬度).我希望空间相关性基于点之间的较大圆距离.

软件包 ramps 包含一个相关结构,用于计算"haversine"距离-尽管我在实现它时遇到了麻烦.我以前使用过其他相关结构( corGaus corExp ),并且没有任何困难.我假设带有'haversine'指标的 corRGaus 可以以相同的方式实现.

我能够使用 lme 函数成功创建具有在平面距离上计算出的空间相关性的lmm.

尽管使用 gls 命令的相关结构存在错误,但我也可以创建具有大圆距计算出的空间相关性的线性模型(不混合).

当尝试对圆形距离较大的线性模型使用 gls 命令时,出现以下错误:

  x = runif(20,1,50)y =符文(20,1,50)gls(x〜y,cor = corRGaus(form =〜x + y))REML的广义最小二乘拟合型号:x〜y数据:NULL对数限制可能性:-78.44925系数:(拦截)y24.762656602 0.007822469相关结构:corRGaus公式:〜x + y参数估计:attr(object,"fixed")&&中的错误不受约束的:'x&&中的无效'x'类型' 

当我增加数据大小时,会出现内存分配错误(仍然是非常小的数据集):

  x = runif(100,1,50)y =符(100,1,50)lat = runif(100,-90,90)long =符文(100,-180,180)gls(x〜y,cor = corRGaus(form =〜x + y))glsEstimate(glsSt,control = glsEstControl)中的错误:'Calloc'无法分配内存(18446744073709551616 of 8 bytes) 

当尝试使用 lme 命令和 ramps 包中的 corRGaus 来运行混合模型时,以下结果:

  x = runif(100,1,50)y =符(100,1,50)LC = c(rep(1,50),rep(2,50))lat = runif(100,-90,90)long =符文(100,-180,180)lme(x〜y,random =〜y | LC,cor = corRGaus(form =〜long + lat))coef<-.corSpatial`(`* tmp *`,value = value [parMap [,i]])中的错误:外部函数调用中的NA/NaN/Inf(arg 1)另外:警告消息:1:在nlminb(c(coef(lmeSt)),function(lmePars)-logLik(lmeSt,lmePars)中,:NA/NaN功能评估2:在nlminb(c(coef(lmeSt)),function(lmePars)-logLik(lmeSt,lmePars)中,:NA/NaN功能评估 

我不确定如何继续使用此方法.我想使用"haversine"功能来完成我的模型,但是我在实现它们时遇到了麻烦.关于 ramps 包的问题很少,而且我看到的实现也很少.任何帮助将不胜感激.

我以前曾尝试修改 nlme 软件包,但无法这样做.我发布了一个有关的问题,其中建议我使用 ramps 包.

我在Windows 8计算机上使用R 3.0.0.

解决方案

好的,这是一个选项,可以在 gls / nlme 中以hasrsine距离实现各种空间相关结构.

在给定距离度量的情况下,各种 corSpatial 类型的类已经具备了根据空间协变量构造相关矩阵的机制.不幸的是, dist 并没有实现正弦距离,而 dist corSpatial 所调用的函数,用于根据空间协变量计算距离矩阵.

距离矩阵的计算是在 getCovariate.corSpatial 中执行的.此方法的修改形式将与其他方法保持适当的距离,并且大多数方法都不需要修改.

在这里,我创建一个新的 corStruct corHaversine ,并仅修改 getCovariate 和其他方法( Dim ),确定使用哪个相关函数.这些不需要修改的方法是从等效的 corSpatial 方法中复制的. corHaversine 中的(新) mimic 参数采用具有相关功能的空间类的名称:默认情况下,它设置为" corSpher ".

注意:除了确保该代码针对球形和高斯相关函数运行之外,我还没有做过很多检查.

  #### corHaversine-与Haversine距离的空间相关性#使用Haversine公式计算由弧度纬度/经度指定的两点之间的测地距离.#输出(以公里为单位)Haversine<-function(x0,x1,y0,y1){a<-sin((y1-y0)/2)^ 2 + cos(y0)* cos(y1)* sin((x1-x0)/2)^ 2v<-2 * asin(min(1,sqrt(a)))6371 * v}#函数在给定经度/纬度的两列矩阵的情况下计算测地线的hasversine距离如果弧度= F,则以十进制形式假定#输入#注意fields :: rdist.earth效率更高haversineDist<-函数(xy,弧度= F){如果(ncol(xy)> 2)stop(输入必须具有两列(经度和纬度)")如果(弧度== F)xy <-xy * pi/180hMat<-matrix(NA,ncol = nrow(xy),nrow = nrow(xy))为(i in 1:nrow(xy)){对于(j:i:nrow(xy)){hMat [j,i] <-Haversine(xy [i,1],xy [j,1],xy [i,2],xy [j,2])}}as.dist(hMat)}##对于大多数方法,corSpatial中的机器无需修改即可工作Initialize.corHaversine<-nlme ::: Initialize.corSpatialrecalc.corHaversine<-nlme ::: recalc.corSpatialVariogram.corHaversine<-nlme ::: Variogram.corSpatialcorFactor.corHaversine<-nlme :::: corFactor.corSpatialcorMatrix.corHaversine<-nlme ::: corMatrix.corSpatialcoef.corHaversine<-nlme ::: coef.corSpatial"coef<-.corHaversine"<-nlme :::"coef<-.corSpatial"## corHaversine类的构造方法corHaversine<-函数(值=数字(0),形式=〜1,模拟="corSpher",块=假,固定=假){spClass<-"corHaversine"attr(value,"formula")<-形式attr(值,块")<-块attr(value,"fixed")<-固定attr(value,"function")<-模拟class(value)<-c(spClass,"corStruct")价值}#结束corHaversine类环境(corHaversine)<-asNamespace("nlme")Dim.corHaversine<-function(object,groups,...){如果(丢失(组))返回(attr(对象,昏暗"))val<-Dim.corStruct(对象,组)val [["start"]]<-c(0,cumsum(val [["len"]] *(val [["len"]]-1)/2)[-val [["M"]]])##将Dim列表的第三个组件用于spClassnames(val)[3]<-"spClass"val [[3]]<-match(attr(object,"function"),c("corSpher","corExp","corGaus","corLin","corRatio"),0)值}环境(Dim.corHaversine)<-asNamespace("nlme")## corHaversine类的getCovariate方法getCovariate.corHaversine<-函数(对象,形式=公式(对象),数据){if(is.null(covar<-attr(object,"covariate"))){#如果对象缺少协变量属性if(missing(data)){#如果对象缺少数据停止(需要数据来计算协变量")}covForm<-getCovariateFormula(form)if(length(all.vars(covForm))> 0){#如果存在协变量if(attr(terms(covForm),"intercept")== 1){#如果公式包含截距covForm<-eval(parse(text = paste(〜",deparse(covForm [[2]]),-1",sep =")))#删除拦截}#只能使用具有正确名称的协变量如果(length(all.vars(covForm))> 2)stop("corHaversine只能接受两个协变量,'lon'和'lat'")if(!all(all.vars(covForm)%in%c("lon","lat")))stop(协变量必须命名为'lon'和'lat'")covar<-as.data.frame(unclass(model.matrix(covForm,model.frame(covForm,data,drop.unused.levels = TRUE))))covar<-covar [,order(colnames(covar),递减= T)]#作为lon ... lat的顺序}别的 {covar<-NULL}if(!is.null(getGroupsFormula(form))){#如果公式中的组按组提取Covargrps<-getGroups(对象,数据=数据)如果(is.null(covar)){covar<-lapply(split(grps,grps),function(x)as.vector(dist(1:length(x))))#填充符?}别的 {GiveDist<-function(el){el<-as.matrix(el)如果(nrow(el)> 1)as.vector(haversineDist(el))其他数值(0)}covar<-lapply(split(covar,grps),GiveDist)}covar<-covar [sapply(covar,length)>0]#没有1-obs组}否则{#如果公式中没有组提取距离如果(is.null(covar)){covar<-as.vector(dist(1:nrow(data)))}别的 {covar<-as.vector(haversineDist(as.matrix(covar)))}}if(any(unlist(covar)== 0)){#检查距离是否为零停止(在\"corHaversine \"中不能有零距离)}}科瓦#end方法getCovariate环境(getCovariate.corHaversine)<-asNamespace("nlme") 

要测试它是否运行,请给定范围参数1000:

  ##测试corHaversine以球形相关性运行(不测试它是否起作用...)图书馆(MASS)set.seed(1001)sample_data<-data.frame(lon = -121:-22,lat = -50:49)ran<-1000#'范围'参数用于球形相关dist_matrix<-as.matrix(haversineDist(sample_data))#Haversine距离矩阵#建立响应的相关矩阵corr_matrix<-1-1.5 *(dist_matrix/ran)+ 0.5 *(dist_matrix/ran)^ 3corr_matrix [dist_matrix>跑] = 0diag(corr_matrix)<-1#建立响应协方差矩阵sigma<-2#残留标准偏差cov_matrix<-(diag(100)* sigma)%*%corr_matrix%*%(diag(100)* sigma)#相关响应#产生回应sample_data $ y<-mvrnorm(1,mu = rep(0,100),Sigma = cov_matrix)#适合型号gls_haversine<-gls(y〜1,相关= corHaversine(form =〜lon + lat,mimic ="corSpher"),data = sample_data)摘要(gls_haversine)#关联结构:corHaversine#公式:〜lon + lat#参数估算值:#    范围#1426.818##系数:#值标准错误t值p值#(拦截)0.9397666 0.7471089 1.257871 0.2114##标准化残差:#最小Q1中Q3最大#-2.1467696 -0.4140958 0.1376988 0.5484481 1.9240042##残留标准误:2.739971#自由度:总共100;残留99 

使用范围参数= 100的高斯相关性进行测试:

  ##测试corHaversine以高斯相关性运行ran = 100#高斯相关参数corr_matrix_gauss<-exp(-(dist_matrix/ran)^ 2)diag(corr_matrix_gauss)<-1#建立响应协方差矩阵cov_matrix_gauss<--(diag(100)* sigma)%*%corr_matrix_gauss%*%(diag(100)* sigma)#相关响应#产生回应sample_data $ y_gauss<-mvrnorm(1,mu = rep(0,100),Sigma = cov_matrix_gauss)#适合型号gls_haversine_gauss<-gls(y_gauss〜1,相关= corHaversine(form =〜lon + lat,mimic ="corGaus"),data = sample_data)摘要(gls_haversine_gauss) 

使用 lme :

  ##与lme一起运行#设置具有组效果的数据group_y<-as.vector(sapply(1:5,function(.)mvrnorm(1,mu = rep(0,100),Sigma = cov_matrix_gauss)))group_effect<-rep(-2:2,每个= 100)group_y = group_y + group_effectgroup_name<-factor(group_effect)lme_dat<-data.frame(y = group_y,group = group_name,lon = sample_data $ lon,lat = sample_data $ lat)#适合型号lme_haversine<-lme(y〜1,随机=〜1 | group,相关= corHaversine(form =〜lon + lat,mimic ="corGaus"),data = lme_dat,control = lmeControl(opt ="optim")))摘要(lme_haversine)#关联结构:corHaversine#公式:〜lon + lat |团体#参数估算值:#    范围#106.3482#固定效果:y〜1#值标准错误DF t值p值#(拦截)-0.0161861 0.6861328 495 -0.02359033 0.9812##标准化组内残差:#最小Q1中Q3最大#-3.0393708 -0.6469423 0.0348155 0.7132133 2.5921573##观察数:500#团体人数:5 

I am trying to create a linear mixed model (lmm) that allows for a spatial correlation between points (have lat/long for each point). I would like the spatial correlation to be based upon the great circular distance between points.

The package ramps includes a correlation structure that computes the ‘haversine’ distance – although I am having trouble implementing it. I have previously used other correlation structures (corGaus, corExp) and not had any difficulties. I am assuming the corRGaus with the 'haversine' metric can be implemented in the same way.

I am able to successfully create an lmm with spatial correlation calculated on a planar distance using the lme function.

I am also able to create a linear model (not mixed) with spatial correlation calculated using great circular distance although there are errors with the correlation structure using the gls command.

When trying to the use the gls command for a linear model with the great circular distance I have the following errors:

x = runif(20, 1,50)
y = runif(20, 1,50)
gls(x ~ y, cor = corRGaus(form = ~ x + y))

Generalized least squares fit by REML
 Model: x ~ y 
 Data: NULL 
Log-restricted-likelihood: -78.44925

Coefficients:
 (Intercept)            y 
24.762656602  0.007822469 

Correlation Structure: corRGaus
 Formula: ~x + y 
 Parameter estimate(s):
Error in attr(object, "fixed") && unconstrained : 
 invalid 'x' type in 'x && y'

When I increase the size of the data there are memory allocation errors (still a very small dataset):

x = runif(100, 1, 50)
y = runif(100, 1, 50)
lat = runif(100, -90, 90)
long = runif(100, -180, 180)
gls(x ~ y, cor = corRGaus(form = ~ x + y))

Error in glsEstimate(glsSt, control = glsEstControl) : 
'Calloc' could not allocate memory (18446744073709551616 of 8 bytes)

When trying to run a mixed model using the lme command and the corRGaus from the ramps package the following results:

x = runif(100, 1, 50)
y = runif(100, 1, 50)
LC = c(rep(1, 50) , rep(2, 50))
lat = runif(100, -90, 90)
long = runif(100, -180, 180)

lme(x ~ y,random = ~ y|LC, cor = corRGaus(form = ~ long + lat))

Error in `coef<-.corSpatial`(`*tmp*`, value = value[parMap[, i]]) : 
  NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In nlminb(c(coef(lmeSt)), function(lmePars) -logLik(lmeSt, lmePars),  :
  NA/NaN function evaluation
2: In nlminb(c(coef(lmeSt)), function(lmePars) -logLik(lmeSt, lmePars),  :
  NA/NaN function evaluation

I am unsure about how to proceed with this method. The "haversine" function is what I want to use to complete my models, but I am having trouble implementing them. There are very few questions anywhere about the ramps package, and I have seen very few implementations. Any helps would be greatly appreciated.

I have previously attempted to modify the nlme package and was unable to do so. I posted a question about this, where I was recommended to use the ramps package.

I am using R 3.0.0 on a Windows 8 computer.

解决方案

OK, here is an option that implements various spatial correlation structures in gls/nlme with haversine distance.

The various corSpatial-type classes already have machinery in place to construct a correlation matrix from spatial covariates, given a distance metric. Unfortunately, dist does not implement haversine distance, and dist is the function called by corSpatial to compute a distance matrix from the spatial covariates.

The distance matrix computations are performed in getCovariate.corSpatial. A modified form of this method will pass the appropriate distance to other methods, and the majority of methods will not need to be modified.

Here, I create a new corStruct class, corHaversine, and modify only getCovariate and one other method (Dim) that determines which correlation function is used. Those methods which do not need modification, are copied from equivalent corSpatial methods. The (new) mimic argument in corHaversine takes the name of the spatial class with the correlation function of interest: by default, it is set to "corSpher".

Caveat: beyond ensuring that this code runs for spherical and Gaussian correlation functions, I haven't really done a lot of checking.

#### corHaversine - spatial correlation with haversine distance

# Calculates the geodesic distance between two points specified by radian latitude/longitude using Haversine formula.
# output in km
haversine <- function(x0, x1, y0, y1) {
    a <- sin( (y1 - y0)/2 )^2 + cos(y0) * cos(y1) * sin( (x1 - x0)/2 )^2
    v <- 2 * asin( min(1, sqrt(a) ) )
    6371 * v
}

# function to compute geodesic haversine distance given two-column matrix of longitude/latitude
# input is assumed in form decimal degrees if radians = F
# note fields::rdist.earth is more efficient
haversineDist <- function(xy, radians = F) {
    if (ncol(xy) > 2) stop("Input must have two columns (longitude and latitude)")
    if (radians == F) xy <- xy * pi/180
    hMat <- matrix(NA, ncol = nrow(xy), nrow = nrow(xy))
    for (i in 1:nrow(xy) ) {
        for (j in i:nrow(xy) ) {
            hMat[j,i] <- haversine(xy[i,1], xy[j,1], xy[i,2], xy[j,2]) 
            }
        }
    as.dist(hMat)
}

## for most methods, machinery from corSpatial will work without modification
Initialize.corHaversine <- nlme:::Initialize.corSpatial
recalc.corHaversine <- nlme:::recalc.corSpatial
Variogram.corHaversine <- nlme:::Variogram.corSpatial
corFactor.corHaversine <- nlme:::corFactor.corSpatial
corMatrix.corHaversine <- nlme:::corMatrix.corSpatial
coef.corHaversine <- nlme:::coef.corSpatial
"coef<-.corHaversine" <- nlme:::"coef<-.corSpatial"

## Constructor for the corHaversine class
corHaversine <- function(value = numeric(0), form = ~ 1, mimic = "corSpher", nugget = FALSE, fixed = FALSE) {
    spClass <- "corHaversine"
    attr(value, "formula") <- form
    attr(value, "nugget") <- nugget
    attr(value, "fixed") <- fixed
    attr(value, "function") <- mimic
    class(value) <- c(spClass, "corStruct")
    value
}   # end corHaversine class
environment(corHaversine) <- asNamespace("nlme")

Dim.corHaversine <- function(object, groups, ...) {
    if (missing(groups)) return(attr(object, "Dim"))
    val <- Dim.corStruct(object, groups)
    val[["start"]] <- c(0, cumsum(val[["len"]] * (val[["len"]] - 1)/2)[-val[["M"]]])
    ## will use third component of Dim list for spClass
    names(val)[3] <- "spClass"
    val[[3]] <- match(attr(object, "function"), c("corSpher", "corExp", "corGaus", "corLin", "corRatio"), 0)
    val
}
environment(Dim.corHaversine) <- asNamespace("nlme")


## getCovariate method for corHaversine class
getCovariate.corHaversine <- function(object, form = formula(object), data) {
    if (is.null(covar <- attr(object, "covariate"))) {          # if object lacks covariate attribute
        if (missing(data)) {                                    # if object lacks data
            stop("need data to calculate covariate")
            }
        covForm <- getCovariateFormula(form)
        if (length(all.vars(covForm)) > 0) {                    # if covariate present
            if (attr(terms(covForm), "intercept") == 1) {       # if formula includes intercept
                covForm <- eval(parse(text = paste("~", deparse(covForm[[2]]),"-1",sep="")))    # remove intercept
                }
            # can only take covariates with correct names
            if (length(all.vars(covForm)) > 2) stop("corHaversine can only take two covariates, 'lon' and 'lat'")
            if ( !all(all.vars(covForm) %in% c("lon", "lat")) ) stop("covariates must be named 'lon' and 'lat'")
            covar <- as.data.frame(unclass(model.matrix(covForm, model.frame(covForm, data, drop.unused.levels = TRUE) ) ) )
            covar <- covar[,order(colnames(covar), decreasing = T)] # order as lon ... lat
            }
        else {
            covar <- NULL
            }

        if (!is.null(getGroupsFormula(form))) {                 # if groups in formula extract covar by groups
            grps <- getGroups(object, data = data)
            if (is.null(covar)) {
                covar <- lapply(split(grps, grps), function(x) as.vector(dist(1:length(x) ) ) )     # filler?
                } 
            else {
                giveDist <- function(el) {
                    el <- as.matrix(el)
                    if (nrow(el) > 1) as.vector(haversineDist(el))                       
                    else numeric(0)
                    }
                covar <- lapply(split(covar, grps), giveDist )
                }
            covar <- covar[sapply(covar, length) > 0]  # no 1-obs groups
            } 
        else {                                  # if no groups in formula extract distance
            if (is.null(covar)) {
                covar <- as.vector(dist(1:nrow(data) ) )
                } 
            else {
                covar <- as.vector(haversineDist(as.matrix(covar) ) )
                }
            }
        if (any(unlist(covar) == 0)) {          # check that no distances are zero
            stop("cannot have zero distances in \"corHaversine\"")
            }
        }
    covar
    }   # end method getCovariate
environment(getCovariate.corHaversine) <- asNamespace("nlme")

To test that this runs, given range parameter of 1000:

## test that corHaversine runs with spherical correlation (not testing that it WORKS ...)
library(MASS)
set.seed(1001)
sample_data <- data.frame(lon = -121:-22, lat = -50:49)
ran <- 1000 # 'range' parameter for spherical correlation
dist_matrix <- as.matrix(haversineDist(sample_data))    # haversine distance matrix
# set up correlation matrix of response
corr_matrix <- 1-1.5*(dist_matrix/ran)+0.5*(dist_matrix/ran)^3
corr_matrix[dist_matrix > ran] = 0
diag(corr_matrix) <- 1
# set up covariance matrix of response
sigma <- 2  # residual standard deviation
cov_matrix <- (diag(100)*sigma) %*% corr_matrix %*% (diag(100)*sigma)   # correlated response
# generate response
sample_data$y <- mvrnorm(1, mu = rep(0, 100), Sigma = cov_matrix)

# fit model
gls_haversine <- gls(y ~ 1, correlation = corHaversine(form=~lon+lat, mimic="corSpher"), data = sample_data)
summary(gls_haversine)

# Correlation Structure: corHaversine
# Formula: ~lon + lat 
# Parameter estimate(s):
#    range 
# 1426.818 
#
# Coefficients:
#                 Value Std.Error  t-value p-value
# (Intercept) 0.9397666 0.7471089 1.257871  0.2114
#
# Standardized residuals:
#        Min         Q1        Med         Q3        Max 
# -2.1467696 -0.4140958  0.1376988  0.5484481  1.9240042 
#
# Residual standard error: 2.735971 
# Degrees of freedom: 100 total; 99 residual

Testing that it runs with Gaussian correlation, with range parameter = 100:

## test that corHaversine runs with Gaussian correlation
ran = 100  # parameter for Gaussian correlation
corr_matrix_gauss <- exp(-(dist_matrix/ran)^2)
diag(corr_matrix_gauss) <- 1
# set up covariance matrix of response
cov_matrix_gauss <- (diag(100)*sigma) %*% corr_matrix_gauss %*% (diag(100)*sigma)   # correlated response
# generate response
sample_data$y_gauss <- mvrnorm(1, mu = rep(0, 100), Sigma = cov_matrix_gauss)

# fit model
gls_haversine_gauss <- gls(y_gauss ~ 1, correlation = corHaversine(form=~lon+lat, mimic = "corGaus"), data = sample_data)
summary(gls_haversine_gauss)

With lme:

## runs with lme
# set up data with group effects
group_y <- as.vector(sapply(1:5, function(.) mvrnorm(1, mu = rep(0, 100), Sigma = cov_matrix_gauss)))
group_effect <- rep(-2:2, each = 100)
group_y = group_y + group_effect
group_name <- factor(group_effect)
lme_dat <- data.frame(y = group_y, group = group_name, lon = sample_data$lon, lat = sample_data$lat)
# fit model
lme_haversine <- lme(y ~ 1, random = ~ 1|group, correlation = corHaversine(form=~lon+lat, mimic = "corGaus"), data = lme_dat, control=lmeControl(opt = "optim") )
summary(lme_haversine)

# Correlation Structure: corHaversine
#  Formula: ~lon + lat | group 
#  Parameter estimate(s):
#    range 
# 106.3482 
# Fixed effects: y ~ 1 
#                  Value Std.Error  DF     t-value p-value
# (Intercept) -0.0161861 0.6861328 495 -0.02359033  0.9812
#
# Standardized Within-Group Residuals:
#        Min         Q1        Med         Q3        Max 
# -3.0393708 -0.6469423  0.0348155  0.7132133  2.5921573 
#
# Number of Observations: 500
# Number of Groups: 5 

这篇关于使用R中的渐变包为线性混合模型指定相关结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆