通过glm()以编程方式将family =传递给step() [英] pass family= to step() via glm() programmatically

查看:140
本文介绍了通过glm()以编程方式将family =传递给step()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过仿真来演示不同模型和特征选择技术的性能,因此我希望通过编程将各种参数传递给glm().

?glm下,我们阅读(斜体字):

家族:对模型中使用的错误分布和链接功能的描述.对于glm,这可以是一个字符串,命名为 家庭功能,家庭功能或调用家庭功能的结果.对于glm.fit,仅支持第三个选项. (有关家庭功能的详细信息,请参阅家庭.)

问题在于,当我随后在结果模型上调用step()时,似乎存在范围问题,并且family=参数不再被识别.

这是一个最小的示例:

getCoef <- function(formula, 
                family = c("gaussian", "binomial"),
                data){

  model_fam <- match.arg(family, c("gaussian", "binomial"))

  fit_null <- glm(update(formula,".~1"), 
                   family = model_fam, 
                   data = data)

  message("So far so good")

  fit_stepBIC <- step(fit_null, 
                      formula, 
                      direction="forward",
                      k = log(nrow(data)),
                      trace=0)

  message("Doesn't make it this far")

  fit_stepBIC$coefficients
}

# returns error 'model_fam' not found 
getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", data = iris)

带有回溯的错误消息:

> getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", data = iris)
So far so good

 Error in stats::glm(formula = Petal.Length ~ Petal.Width + Species, family = model_fam,  : 
  object 'model_fam' not found 
9 stats::glm(formula = Petal.Length ~ Petal.Width + Species, family = model_fam, 
    data = data, method = "model.frame") 
8 eval(expr, envir, enclos) 
7 eval(fcall, env) 
6 model.frame.glm(fob, xlev = object$xlevels) 
5 model.frame(fob, xlev = object$xlevels) 
4 add1.glm(fit, scope$add, scale = scale, trace = trace, k = k, 
    ...) 
3 add1(fit, scope$add, scale = scale, trace = trace, k = k, ...) 
2 step(fit_null, formula, direction = "forward", k = log(nrow(data)), 
    trace = 0) 
1 getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", 
    data = iris) 

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.4       

传递此参数以便逐步识别的最自然方法是什么?我知道的一种可能的解决方法是通过以model_fam为条件的if-then-else调用具有显式姓氏的glm().

我认为以下基于evalbquote.()的解决方案可能会解决您的问题.

我还安装了R版本3.2.4,并且从您的代码中得到了与您完全相同的错误.下面的解决方案使其可以在我的计算机上工作.

getCoef <- function(formula, 
                family = c("gaussian", "binomial"),
                data){

    model_fam <- match.arg(family, c("gaussian", "binomial"))

    fit_null <- eval(bquote(
        glm(update(.(formula),".~1"), 
            family = .(model_fam), 
            data = .(data))))

    message("So far so good")

    fit_stepBIC <- step(fit_null, 
                        formula, 
                        direction="forward",
                        k = log(nrow(data)),
                        trace=0)

    message("Doesn't make it this far")

    fit_stepBIC$coefficients
}

# returns error 'model_fam' not found 
 getCoef(formula = Petal.Length ~ Petal.Width + Species,
        family = "gaussian",
        data = iris)

So far so good
Doesn't make it this far
      (Intercept) Speciesversicolor  Speciesvirginica       Petal.Width 
         1.211397          1.697791          2.276693          1.018712   

I am trying to demonstrate via simulation the performance of different models and feature selection techniques, so I wish to pass various arguments to glm() programmatically.

Under ?glm we read (italics mine):

family: a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For glm.fit only the third option is supported. (See family for details of family functions.)

The problem is that when I then call step() on the resulting model, there seems to be a scoping problem and the family= parameter is no longer recognized.

Here is a minimal example:

getCoef <- function(formula, 
                family = c("gaussian", "binomial"),
                data){

  model_fam <- match.arg(family, c("gaussian", "binomial"))

  fit_null <- glm(update(formula,".~1"), 
                   family = model_fam, 
                   data = data)

  message("So far so good")

  fit_stepBIC <- step(fit_null, 
                      formula, 
                      direction="forward",
                      k = log(nrow(data)),
                      trace=0)

  message("Doesn't make it this far")

  fit_stepBIC$coefficients
}

# returns error 'model_fam' not found 
getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", data = iris)

Error message with traceback:

> getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", data = iris)
So far so good

 Error in stats::glm(formula = Petal.Length ~ Petal.Width + Species, family = model_fam,  : 
  object 'model_fam' not found 
9 stats::glm(formula = Petal.Length ~ Petal.Width + Species, family = model_fam, 
    data = data, method = "model.frame") 
8 eval(expr, envir, enclos) 
7 eval(fcall, env) 
6 model.frame.glm(fob, xlev = object$xlevels) 
5 model.frame(fob, xlev = object$xlevels) 
4 add1.glm(fit, scope$add, scale = scale, trace = trace, k = k, 
    ...) 
3 add1(fit, scope$add, scale = scale, trace = trace, k = k, ...) 
2 step(fit_null, formula, direction = "forward", k = log(nrow(data)), 
    trace = 0) 
1 getCoef(Petal.Length ~ Petal.Width + Species, family = "gaussian", 
    data = iris) 

> sessionInfo()
R version 3.2.4 Revised (2016-03-16 r70336)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.4       

What is the most natural way to pass this parameter so that is recognized by step? One possible workaround I'm aware of would be to call glm() with the explicit family name via if-then-else conditioned on model_fam.

解决方案

I think the following solution, based on eval, bquote and .() might solve your problem.

I also have R-version 3.2.4 installed, and I got the exact same error you got from your code. The solution below made it work at my computer.

getCoef <- function(formula, 
                family = c("gaussian", "binomial"),
                data){

    model_fam <- match.arg(family, c("gaussian", "binomial"))

    fit_null <- eval(bquote(
        glm(update(.(formula),".~1"), 
            family = .(model_fam), 
            data = .(data))))

    message("So far so good")

    fit_stepBIC <- step(fit_null, 
                        formula, 
                        direction="forward",
                        k = log(nrow(data)),
                        trace=0)

    message("Doesn't make it this far")

    fit_stepBIC$coefficients
}

# returns error 'model_fam' not found 
 getCoef(formula = Petal.Length ~ Petal.Width + Species,
        family = "gaussian",
        data = iris)

So far so good
Doesn't make it this far
      (Intercept) Speciesversicolor  Speciesvirginica       Petal.Width 
         1.211397          1.697791          2.276693          1.018712   

这篇关于通过glm()以编程方式将family =传递给step()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆