使用 R 中的斜坡包为线性混合模型指定相关结构 [英] Specifying a correlation structure for a linear mixed model using the ramps package in R

查看:13
本文介绍了使用 R 中的斜坡包为线性混合模型指定相关结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个线性混合模型 (lmm),它允许点之间的空间相关性(每个点都有纬度/经度).我希望空间相关性基于点之间的大圆距离.

ramps 包包含一个计算haversine"距离的相关结构——尽管我在实现它时遇到了麻烦.我以前使用过其他相关结构(corGauscorExp)并且没有任何困难.我假设可以以相同的方式实现带有haversine"度量的 corRGaus.

我能够使用 lme 函数成功地创建一个具有空间相关性的 lmm 计算平面距离.

我还能够创建一个线性模型(未混合),其中使用大圆距离计算空间相关性,尽管使用 gls 命令的相关性结构存在错误.

当尝试对具有较大圆距的线性模型使用 gls 命令时,出现以下错误:

x = runif(20, 1,50)y = runif(20, 1,50)gls(x ~ y, cor = corRGaus(form = ~ x + y))REML 拟合的广义最小二乘法型号:x~y数据:空对数限制似然:-78.44925系数:(拦截) y24.762656602 0.007822469相关结构:corRGaus公式:~x + y参数估计:attr(object, "fixed") && 中的错误不受约束:x &&"中的x"类型无效你'

当我增加数据的大小时,会出现内存分配错误(仍然是一个非常小的数据集):

x = runif(100, 1, 50)y = runif(100, 1, 50)纬度 = runif(100, -90, 90)long = runif(100, -180, 180)gls(x ~ y, cor = corRGaus(form = ~ x + y))glsEstimate(glsSt, control = glsEstControl) 中的错误:Calloc"无法分配内存(8 字节的 18446744073709551616)

当尝试使用 lme 命令和 ramps 包中的 corRGaus 运行混合模型时,结果如下:

x = runif(100, 1, 50)y = runif(100, 1, 50)LC = c(rep(1, 50) , rep(2, 50))纬度 = runif(100, -90, 90)long = runif(100, -180, 180)lme(x ~ y,random = ~ y|LC, cor = corRGaus(form = ~ long + lat))`coef<-.corSpatial`(`*tmp*`, value = value[parMap[, i]]) 中的错误:外部函数调用中的 NA/NaN/Inf (arg 1)此外: 警告消息:1: 在 nlminb(c(coef(lmeSt)), function(lmePars) -logLik(lmeSt, lmePars), :NA/NaN 函数评估2: 在 nlminb(c(coef(lmeSt)), function(lmePars) -logLik(lmeSt, lmePars), :NA/NaN 函数评估

我不确定如何继续使用此方法.haversine"函数是我想用来完成我的模型的函数,但是我在实现它们时遇到了麻烦.关于ramps 包的问题很少,我见过的实现也很少.任何帮助将不胜感激.

我之前曾尝试修改 nlme 包,但未能这样做.我发布了一个关于this的问题,其中我被推荐使用 ramps 包.

我在 Windows 8 计算机上使用 R 3.0.0.

解决方案

OK,这里有一个选项,在 gls/nlme 中实现各种空间相关结构,具有haversine distance.

各种corSpatial 类型的类已经具备根据距离度量从空间协变量构建相关矩阵的机制.不幸的是,dist 没有实现半正弦距离,distcorSpatial 调用的函数,用于根据空间协变量计算距离矩阵.>

距离矩阵计算在 getCovariate.corSpatial 中执行.此方法的修改形式将适当的距离传递给其他方法,大多数方法不需要修改.

在这里,我创建了一个新的 corStruct 类,corHaversine,并且只修改了 getCovariate 和一个其他方法(Dimcode>) 决定使用哪个相关函数.那些不需要修改的方法是从等效的 corSpatial 方法复制而来的.corHaversine 中的(新)mimic 参数采用具有相关相关函数的空间类的名称:默认情况下,它设置为corSpher代码>".

警告:除了确保此代码针对球面和高斯相关函数运行之外,我还没有真正做过很多检查.

#### corHaversine - 与半正弦距离的空间相关性# 使用Haversine公式计算由弧度纬度/经度指定的两点之间的测地线距离.# 以公里为单位的输出hasrsine <- 函数(x0, x1, y0, y1) {a <- sin( (y1 - y0)/2 )^2 + cos(y0) * cos(y1) * sin( (x1 - x0)/2 )^2v <- 2 * asin( min(1, sqrt(a) ) )6371 * v}# 给定经度/纬度的两列矩阵,计算测地线半正弦距离的函数# 如果弧度 = F,则假定输入采用十进制度数形式# 注意 fields::rdist.earth 效率更高haversineDist <- 函数(xy,弧度 = F){if (ncol(xy) > 2) stop("输入必须有两列(经度和纬度)")如果(弧度 == F)xy <- xy * pi/180hMat <- 矩阵(NA, ncol = nrow(xy), nrow = nrow(xy))for (i in 1:nrow(xy) ) {for (j in i:nrow(xy) ) {hMat[j,i] <-半正弦(xy[i,1], xy[j,1], xy[i,2], xy[j,2])}}as.dist(hMat)}## 对于大多数方法,来自 corSpatial 的机器无需修改即可工作Initialize.corHaversine <- nlme::::Initialize.corSpatialrecalc.corHaversine <- nlme::::recalc.corSpatialVariogram.corHaversine <- nlme::::Variogram.corSpatialcorFactor.corHaversine <- nlme::::corFactor.corSpatialcorMatrix.corHaversine <- nlme::::corMatrix.corSpatialcoef.corHaversine <- nlme::::coef.corSpatial"coef<-.corHaversine" <- nlme:::"coef<-.corSpatial"## corHaversine 类的构造函数corHaversine <- function(value = numeric(0), form = ~ 1, miim = "corSpher", nugget = FALSE, fixed = FALSE) {spClass <- "corHaversine"attr(value, "formula") <- 形式attr(value, "nugget") <- nuggetattr(value, "fixed") <- 固定attr(value, "function") <- 模仿类(值)<- c(spClass,corStruct")价值} # 结束 corHaversine 类环境(corHaversine)<- asNamespace(nlme")Dim.corHaversine <- function(object, groups, ...) {if (missing(groups)) return(attr(object, "Dim"))val <- Dim.corStruct(对象,组)val[["start"]] <- c(0, cumsum(val[["len"]] * (val[["len"]] - 1)/2)[-val[["M"]]])## 将为 spClass 使用 Dim 列表的第三个组件名称(val)[3] <-spClass"val[[3]] <- match(attr(object, "function"), c("corSpher", "corExp", "corGaus", "corLin", "corRatio"), 0)值}环境(Dim.corHaversine)<- asNamespace(nlme")## corHaversine 类的 getCovariate 方法getCovariate.corHaversine <- function(object, form = formula(object), data) {if (is.null(covar <- attr(object, "covariate"))) { # 如果对象缺少协变量属性if (missing(data)) { # 如果对象缺少数据stop("需要数据来计算协变量")}covForm <- getCovariateFormula(form)if (length(all.vars(covForm)) > 0) { # 如果存在协变量if (attr(terms(covForm), "intercept") == 1) { # 如果公式包括拦截covForm <- eval(parse(text = paste("~", deparse(covForm[[2]]),"-1",sep=""))) # 删除拦截}# 只能采用名称正确的协变量if (length(all.vars(covForm)) > 2) stop("corHaversine 只能取两个协变量,'lon' 和 'lat'")if ( !all(all.vars(covForm) %in% c("lon", "lat")) ) stop("协变量必须命名为'lon'和'lat'")covar <- as.data.frame(unclass(model.matrix(covForm, model.frame(covForm, data, drop.unused.levels = TRUE) ) ) )covar <- covar[,order(colnames(covar), reduction = T)] # order as lon ... lat}别的 {covar <- NULL}if (!is.null(getGroupsFormula(form))) { # 如果公式中的组按组提取 covargrps <- getGroups(object, data = data)如果(is.null(covar)){covar <- lapply(split(grps, grps), function(x) as.vector(dist(1:length(x) ) ) ) # 填充符?}别的 {giveDist <- 函数(el){el <- as.matrix(el)如果(nrow(el)> 1)as.vector(haversineDist(el))其他数字(0)}covar <- lapply(split(covar, grps), giveDist )}covar <- covar[sapply(covar, length) >0] # 没有 1-obs 组}else { # 如果公式中没有组提取距离如果(is.null(covar)){covar <- as.vector(dist(1:nrow(data)))}别的 {covar <- as.vector(haversineDist(as.matrix(covar)))}}if (any(unlist(covar) == 0)) { # 检查没有距离为零stop("在"corHaversine""中不能有零距离)}}科瓦} # 结束方法getCovariate环境(getCovariate.corHaversine)<- asNamespace(nlme")

为了测试它是否运行,给定范围参数 1000:

## 测试 corHaversine 以球面相关性运行(不测试它是否有效......)图书馆(海量)设置种子(1001)sample_data <- data.frame(lon = -121:-22, lat = -50:49)跑 <- 1000 # 球面相关的范围"参数dist_matrix <- as.matrix(haversineDist(sample_data)) # hasrsine距离矩阵# 建立响应的相关矩阵corr_matrix <- 1-1.5*(dist_matrix/ran)+0.5*(dist_matrix/ran)^3corr_matrix[dist_matrix >跑] = 0诊断(corr_matrix)<- 1# 设置响应的协方差矩阵sigma <- 2 # 残差标准差cov_matrix <- (diag(100)*sigma) %*% corr_matrix %*% (diag(100)*sigma) # 相关响应# 生成响应sample_data$y <- mvrnorm(1, mu = rep(0, 100), Sigma = cov_matrix)#拟合模型gls_haversine <- gls(y ~ 1,correlation = corHaversine(form=~lon+lat,imim="corSpher"), data = sample_data)摘要(gls_haversine)# 相关结构:corHaversine# 公式:~lon + lat# 参数估计:#    范围#1426.818## 系数:# Value Std.Error t-value p-value#(拦截)0.9397666 0.7471089 1.257871 0.2114## 标准化残差:# 最低 Q1 Med Q3 Max# -2.1467696 -0.4140958 0.1376988 0.5484481 1.9240042## 残留标准误差:2.735971# 自由度:总共 100;99残

测试它以高斯相关运行,范围参数 = 100:

## 测试 corHaversine 以高斯相关运行ran = 100 # 高斯相关参数corr_matrix_gauss <- exp(-(dist_matrix/ran)^2)诊断(corr_matrix_gauss)<- 1# 设置响应的协方差矩阵cov_matrix_gauss <- (diag(100)*sigma) %*% corr_matrix_gauss %*% (diag(100)*sigma) # 相关响应# 生成响应sample_data$y_gauss <- mvrnorm(1, mu = rep(0, 100), Sigma = cov_matrix_gauss)#拟合模型gls_haversine_gauss <- gls(y_gauss ~ 1,correlation = corHaversine(form=~lon+lat,imim = "corGaus"), data = sample_data)摘要(gls_haversine_gauss)

使用lme:

## 与 lme 一起运行# 使用组效应设置数据group_y <- as.vector(sapply(1:5, function(.) mvrnorm(1, mu = rep(0, 100), Sigma = cov_matrix_gauss)))group_effect <- rep(-2:2, each = 100)group_y = group_y + group_effectgroup_name <- 因子(group_effect)lme_dat <- data.frame(y = group_y, group = group_name, lon = sample_data$lon, lat = sample_data$lat)#拟合模型lme_haversine <- lme(y ~ 1, random = ~ 1|group, correlation = corHaversine(form=~lon+lat, miim = "corGaus"), data = lme_dat, control=lmeControl(opt = "optim") )总结(lme_haversine)# 相关结构:corHaversine# 公式:~lon + lat |团体# 参数估计:#    范围#106.3482# 固定效果:y ~ 1# Value Std.Error DF t-value p-value#(拦截)-0.0161861 0.6861328 495 -0.02359033 0.9812## 标准化组内残差:# 最低 Q1 Med Q3 Max# -3.0393708 -0.6469423 0.0348155 0.7132133 2.5921573## 观察次数:500# 组数:5

I am trying to create a linear mixed model (lmm) that allows for a spatial correlation between points (have lat/long for each point). I would like the spatial correlation to be based upon the great circular distance between points.

The package ramps includes a correlation structure that computes the ‘haversine’ distance – although I am having trouble implementing it. I have previously used other correlation structures (corGaus, corExp) and not had any difficulties. I am assuming the corRGaus with the 'haversine' metric can be implemented in the same way.

I am able to successfully create an lmm with spatial correlation calculated on a planar distance using the lme function.

I am also able to create a linear model (not mixed) with spatial correlation calculated using great circular distance although there are errors with the correlation structure using the gls command.

When trying to the use the gls command for a linear model with the great circular distance I have the following errors:

x = runif(20, 1,50)
y = runif(20, 1,50)
gls(x ~ y, cor = corRGaus(form = ~ x + y))

Generalized least squares fit by REML
 Model: x ~ y 
 Data: NULL 
Log-restricted-likelihood: -78.44925

Coefficients:
 (Intercept)            y 
24.762656602  0.007822469 

Correlation Structure: corRGaus
 Formula: ~x + y 
 Parameter estimate(s):
Error in attr(object, "fixed") && unconstrained : 
 invalid 'x' type in 'x && y'

When I increase the size of the data there are memory allocation errors (still a very small dataset):

x = runif(100, 1, 50)
y = runif(100, 1, 50)
lat = runif(100, -90, 90)
long = runif(100, -180, 180)
gls(x ~ y, cor = corRGaus(form = ~ x + y))

Error in glsEstimate(glsSt, control = glsEstControl) : 
'Calloc' could not allocate memory (18446744073709551616 of 8 bytes)

When trying to run a mixed model using the lme command and the corRGaus from the ramps package the following results:

x = runif(100, 1, 50)
y = runif(100, 1, 50)
LC = c(rep(1, 50) , rep(2, 50))
lat = runif(100, -90, 90)
long = runif(100, -180, 180)

lme(x ~ y,random = ~ y|LC, cor = corRGaus(form = ~ long + lat))

Error in `coef<-.corSpatial`(`*tmp*`, value = value[parMap[, i]]) : 
  NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In nlminb(c(coef(lmeSt)), function(lmePars) -logLik(lmeSt, lmePars),  :
  NA/NaN function evaluation
2: In nlminb(c(coef(lmeSt)), function(lmePars) -logLik(lmeSt, lmePars),  :
  NA/NaN function evaluation

I am unsure about how to proceed with this method. The "haversine" function is what I want to use to complete my models, but I am having trouble implementing them. There are very few questions anywhere about the ramps package, and I have seen very few implementations. Any helps would be greatly appreciated.

I have previously attempted to modify the nlme package and was unable to do so. I posted a question about this, where I was recommended to use the ramps package.

I am using R 3.0.0 on a Windows 8 computer.

解决方案

OK, here is an option that implements various spatial correlation structures in gls/nlme with haversine distance.

The various corSpatial-type classes already have machinery in place to construct a correlation matrix from spatial covariates, given a distance metric. Unfortunately, dist does not implement haversine distance, and dist is the function called by corSpatial to compute a distance matrix from the spatial covariates.

The distance matrix computations are performed in getCovariate.corSpatial. A modified form of this method will pass the appropriate distance to other methods, and the majority of methods will not need to be modified.

Here, I create a new corStruct class, corHaversine, and modify only getCovariate and one other method (Dim) that determines which correlation function is used. Those methods which do not need modification, are copied from equivalent corSpatial methods. The (new) mimic argument in corHaversine takes the name of the spatial class with the correlation function of interest: by default, it is set to "corSpher".

Caveat: beyond ensuring that this code runs for spherical and Gaussian correlation functions, I haven't really done a lot of checking.

#### corHaversine - spatial correlation with haversine distance

# Calculates the geodesic distance between two points specified by radian latitude/longitude using Haversine formula.
# output in km
haversine <- function(x0, x1, y0, y1) {
    a <- sin( (y1 - y0)/2 )^2 + cos(y0) * cos(y1) * sin( (x1 - x0)/2 )^2
    v <- 2 * asin( min(1, sqrt(a) ) )
    6371 * v
}

# function to compute geodesic haversine distance given two-column matrix of longitude/latitude
# input is assumed in form decimal degrees if radians = F
# note fields::rdist.earth is more efficient
haversineDist <- function(xy, radians = F) {
    if (ncol(xy) > 2) stop("Input must have two columns (longitude and latitude)")
    if (radians == F) xy <- xy * pi/180
    hMat <- matrix(NA, ncol = nrow(xy), nrow = nrow(xy))
    for (i in 1:nrow(xy) ) {
        for (j in i:nrow(xy) ) {
            hMat[j,i] <- haversine(xy[i,1], xy[j,1], xy[i,2], xy[j,2]) 
            }
        }
    as.dist(hMat)
}

## for most methods, machinery from corSpatial will work without modification
Initialize.corHaversine <- nlme:::Initialize.corSpatial
recalc.corHaversine <- nlme:::recalc.corSpatial
Variogram.corHaversine <- nlme:::Variogram.corSpatial
corFactor.corHaversine <- nlme:::corFactor.corSpatial
corMatrix.corHaversine <- nlme:::corMatrix.corSpatial
coef.corHaversine <- nlme:::coef.corSpatial
"coef<-.corHaversine" <- nlme:::"coef<-.corSpatial"

## Constructor for the corHaversine class
corHaversine <- function(value = numeric(0), form = ~ 1, mimic = "corSpher", nugget = FALSE, fixed = FALSE) {
    spClass <- "corHaversine"
    attr(value, "formula") <- form
    attr(value, "nugget") <- nugget
    attr(value, "fixed") <- fixed
    attr(value, "function") <- mimic
    class(value) <- c(spClass, "corStruct")
    value
}   # end corHaversine class
environment(corHaversine) <- asNamespace("nlme")

Dim.corHaversine <- function(object, groups, ...) {
    if (missing(groups)) return(attr(object, "Dim"))
    val <- Dim.corStruct(object, groups)
    val[["start"]] <- c(0, cumsum(val[["len"]] * (val[["len"]] - 1)/2)[-val[["M"]]])
    ## will use third component of Dim list for spClass
    names(val)[3] <- "spClass"
    val[[3]] <- match(attr(object, "function"), c("corSpher", "corExp", "corGaus", "corLin", "corRatio"), 0)
    val
}
environment(Dim.corHaversine) <- asNamespace("nlme")


## getCovariate method for corHaversine class
getCovariate.corHaversine <- function(object, form = formula(object), data) {
    if (is.null(covar <- attr(object, "covariate"))) {          # if object lacks covariate attribute
        if (missing(data)) {                                    # if object lacks data
            stop("need data to calculate covariate")
            }
        covForm <- getCovariateFormula(form)
        if (length(all.vars(covForm)) > 0) {                    # if covariate present
            if (attr(terms(covForm), "intercept") == 1) {       # if formula includes intercept
                covForm <- eval(parse(text = paste("~", deparse(covForm[[2]]),"-1",sep="")))    # remove intercept
                }
            # can only take covariates with correct names
            if (length(all.vars(covForm)) > 2) stop("corHaversine can only take two covariates, 'lon' and 'lat'")
            if ( !all(all.vars(covForm) %in% c("lon", "lat")) ) stop("covariates must be named 'lon' and 'lat'")
            covar <- as.data.frame(unclass(model.matrix(covForm, model.frame(covForm, data, drop.unused.levels = TRUE) ) ) )
            covar <- covar[,order(colnames(covar), decreasing = T)] # order as lon ... lat
            }
        else {
            covar <- NULL
            }

        if (!is.null(getGroupsFormula(form))) {                 # if groups in formula extract covar by groups
            grps <- getGroups(object, data = data)
            if (is.null(covar)) {
                covar <- lapply(split(grps, grps), function(x) as.vector(dist(1:length(x) ) ) )     # filler?
                } 
            else {
                giveDist <- function(el) {
                    el <- as.matrix(el)
                    if (nrow(el) > 1) as.vector(haversineDist(el))                       
                    else numeric(0)
                    }
                covar <- lapply(split(covar, grps), giveDist )
                }
            covar <- covar[sapply(covar, length) > 0]  # no 1-obs groups
            } 
        else {                                  # if no groups in formula extract distance
            if (is.null(covar)) {
                covar <- as.vector(dist(1:nrow(data) ) )
                } 
            else {
                covar <- as.vector(haversineDist(as.matrix(covar) ) )
                }
            }
        if (any(unlist(covar) == 0)) {          # check that no distances are zero
            stop("cannot have zero distances in "corHaversine"")
            }
        }
    covar
    }   # end method getCovariate
environment(getCovariate.corHaversine) <- asNamespace("nlme")

To test that this runs, given range parameter of 1000:

## test that corHaversine runs with spherical correlation (not testing that it WORKS ...)
library(MASS)
set.seed(1001)
sample_data <- data.frame(lon = -121:-22, lat = -50:49)
ran <- 1000 # 'range' parameter for spherical correlation
dist_matrix <- as.matrix(haversineDist(sample_data))    # haversine distance matrix
# set up correlation matrix of response
corr_matrix <- 1-1.5*(dist_matrix/ran)+0.5*(dist_matrix/ran)^3
corr_matrix[dist_matrix > ran] = 0
diag(corr_matrix) <- 1
# set up covariance matrix of response
sigma <- 2  # residual standard deviation
cov_matrix <- (diag(100)*sigma) %*% corr_matrix %*% (diag(100)*sigma)   # correlated response
# generate response
sample_data$y <- mvrnorm(1, mu = rep(0, 100), Sigma = cov_matrix)

# fit model
gls_haversine <- gls(y ~ 1, correlation = corHaversine(form=~lon+lat, mimic="corSpher"), data = sample_data)
summary(gls_haversine)

# Correlation Structure: corHaversine
# Formula: ~lon + lat 
# Parameter estimate(s):
#    range 
# 1426.818 
#
# Coefficients:
#                 Value Std.Error  t-value p-value
# (Intercept) 0.9397666 0.7471089 1.257871  0.2114
#
# Standardized residuals:
#        Min         Q1        Med         Q3        Max 
# -2.1467696 -0.4140958  0.1376988  0.5484481  1.9240042 
#
# Residual standard error: 2.735971 
# Degrees of freedom: 100 total; 99 residual

Testing that it runs with Gaussian correlation, with range parameter = 100:

## test that corHaversine runs with Gaussian correlation
ran = 100  # parameter for Gaussian correlation
corr_matrix_gauss <- exp(-(dist_matrix/ran)^2)
diag(corr_matrix_gauss) <- 1
# set up covariance matrix of response
cov_matrix_gauss <- (diag(100)*sigma) %*% corr_matrix_gauss %*% (diag(100)*sigma)   # correlated response
# generate response
sample_data$y_gauss <- mvrnorm(1, mu = rep(0, 100), Sigma = cov_matrix_gauss)

# fit model
gls_haversine_gauss <- gls(y_gauss ~ 1, correlation = corHaversine(form=~lon+lat, mimic = "corGaus"), data = sample_data)
summary(gls_haversine_gauss)

With lme:

## runs with lme
# set up data with group effects
group_y <- as.vector(sapply(1:5, function(.) mvrnorm(1, mu = rep(0, 100), Sigma = cov_matrix_gauss)))
group_effect <- rep(-2:2, each = 100)
group_y = group_y + group_effect
group_name <- factor(group_effect)
lme_dat <- data.frame(y = group_y, group = group_name, lon = sample_data$lon, lat = sample_data$lat)
# fit model
lme_haversine <- lme(y ~ 1, random = ~ 1|group, correlation = corHaversine(form=~lon+lat, mimic = "corGaus"), data = lme_dat, control=lmeControl(opt = "optim") )
summary(lme_haversine)

# Correlation Structure: corHaversine
#  Formula: ~lon + lat | group 
#  Parameter estimate(s):
#    range 
# 106.3482 
# Fixed effects: y ~ 1 
#                  Value Std.Error  DF     t-value p-value
# (Intercept) -0.0161861 0.6861328 495 -0.02359033  0.9812
#
# Standardized Within-Group Residuals:
#        Min         Q1        Med         Q3        Max 
# -3.0393708 -0.6469423  0.0348155  0.7132133  2.5921573 
#
# Number of Observations: 500
# Number of Groups: 5 

这篇关于使用 R 中的斜坡包为线性混合模型指定相关结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆