自动比较来自鼠标的 glm.mids 的嵌套模型 [英] Automatically compare nested models from mice's glm.mids

查看:22
本文介绍了自动比较来自鼠标的 glm.mids 的嵌套模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个来自 R 的 mice 包的乘法估算模型,其中有很多因子变量.例如:

I have a multiply-imputed model from R's mice package in which there are lots of factor variables. For example:

library(mice)
library(Hmisc)

# turn all the variables into factors
fake = nhanes
fake$age = as.factor(nhanes$age)
fake$bmi = cut2(nhanes$bmi, g=3) 
fake$chl = cut2(nhanes$chl, g=3) 

head(fake)
  age         bmi hyp       chl
1   1        <NA>  NA      <NA>
2   2 [20.4,25.5)   1 [187,206)
3   1        <NA>   1 [187,206)
4   3        <NA>  NA      <NA>
5   1 [20.4,25.5)   1 [113,187)
6   3        <NA>  NA [113,187)

imput = mice(nhanes)

# big model
fit1 = glm.mids((hyp==2) ~ age + bmi + chl, data=imput, family = binomial)

我想通过针对每次删除一个变量的每个可能嵌套模型测试完整模型来测试模型中每个整个因子变量(而不是每个级别的指标变量)的显着性.手动,我可以做到:

I want to test the significance of each entire factor variable in the model (not the indicator variables for each level) by testing the full model against each possible nested model that drops one variable at a time. Manually, I can do:

# small model (no chl)
fit2 = glm.mids((hyp==2) ~ age + bmi, data=imput, family = binomial)

# extract p-value from pool.compare
pool.compare(fit1, fit2)$pvalue

如何为模型中的所有因子变量自动执行此操作?非常有用的函数 drop1 被推荐给我用于 之前的问题 -- 现在我想做一些与 mice 情况完全一样的事情.

How can I do this automatically for all the factor variables in my model? The very helpful function drop1 was suggested to me for a previous question -- now I want to do something exactly like that except for the mice case.

可能有帮助的注意事项:pool.compare 的一个令人讨厌的功能是它似乎希望将较大模型中的额外"变量放在与较小的模型共享.

Possibly helpful note: An annoying feature of pool.compare is that it appears to want the "extra" variables in the larger model to be placed after the ones that are shared with the smaller model.

推荐答案

按照 pool.compare 所需的顺序排列后,您可以使用循环遍历预测变量的不同组合.

You can use a loop to iterate through the different combinations of predictors, after arranging them in the order required for pool.compare.

所以使用上面的 fake 数据 - 调整类别数量

So using your fake data from above - tweaked the number of categories

library(mice)
library(Hmisc)
# turn all the variables into factors
# turn all the variables into factors
fake <- nhanes
fake$age <- as.factor(nhanes$age)
fake$bmi <- cut2(nhanes$bmi, g=2) 
fake$chl <- cut2(nhanes$chl, g=2) 

# Impute
imput <- mice(fake, seed=1)

# Create models 
# - reduced models with one variable removed
# - full models with extra variables at end of expression
vars <- c("age", "bmi", "chl")

red <- combn(vars, length(vars)-1 , simplify=FALSE)
diffs <- lapply(red, function(i) setdiff(vars, i) )
(full <- lapply(1:length(red), function(i) 
                            paste(c(red[[i]], diffs[[i]]), collapse=" + ")))
#[[1]]
#[1] "age + bmi + chl"

#[[2]]
#[1] "age + chl + bmi"

#[[3]]
#[1] "bmi + chl + age"

(red <- combn(vars, length(vars)-1 , FUN=paste, collapse=" + "))
#[1] "age + bmi" "age + chl" "bmi + chl"

模型现在以正确的顺序传递给 glm 调用.我还替换了 glm.mids 方法,因为它已被替换为 with.mids - 参见 ?glm.mids

The models are now in the correct order to pass to the glm call. I've also replaced glm.mids method as it has been replaced by with.mids - see ?glm.mids

out <- vector("list", length(red))

for( i in 1:length(red)) {

  redMod <-  with(imput, 
               glm(formula(paste("(hyp==2) ~ ", red[[i]])), family = binomial))

  fullMod <-  with(imput, 
               glm(formula(paste("(hyp==2) ~ ", full[[i]])), family = binomial))

  out[[i]] <- list(predictors = diffs[[i]], 
                   pval = c(pool.compare(fullMod, redMod)$pvalue))
   }

do.call(rbind.data.frame, out)
#    predictors      pval
#2         chl 0.9976629
#21        bmi 0.9985028
#3         age 0.9815831

# Check manually by leaving out chl
mod1 <- with(imput, glm((hyp==2) ~ age + bmi + chl , family = binomial))
mod2 <- with(imput, glm((hyp==2) ~ age + bmi , family = binomial))
pool.compare(mod1, mod2)$pvalue
#         [,1]
#[1,] 0.9976629

使用这个数据集你会得到很多警告

You will get a lot of warnings using this dataset

编辑

你可以把它包装在一个函数中

You could wrap this in a function

impGlmDrop1 <- function(vars, outcome, Data=imput,  Family="binomial") 
{

  red <- combn(vars, length(vars)-1 , simplify=FALSE)
  diffs <- lapply(red, function(i) setdiff(vars, i))
  full <- lapply(1:length(red), function(i) 
                      paste(c(red[[i]], diffs[[i]]), collapse=" + "))
  red <- combn(vars, length(vars)-1 , FUN=paste, collapse=" + ")

  out <- vector("list", length(red))
  for( i in 1:length(red)) {

  redMod <-  with(Data, 
              glm(formula(paste(outcome, red[[i]], sep="~")), family = Family))
  fullMod <-  with(Data, 
              glm(formula(paste(outcome, full[[i]], sep="~")), family = Family))
  out[[i]] <- list(predictors = diffs[[i]], 
                   pval = c(pool.compare(fullMod, redMod)$pvalue)  )
  }
  do.call(rbind.data.frame, out)
}

# Run
impGlmDrop1(c("age", "bmi", "chl"), "(hyp==2)")

这篇关于自动比较来自鼠标的 glm.mids 的嵌套模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆