在用户定义的函数中使用step()时缺少对象错误 [英] Missing object error when using step() within a user-defined function

查看:106
本文介绍了在用户定义的函数中使用step()时缺少对象错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

5天,仍然没有答案

  • 从西蒙的评论可以看出,这是一个可重复且非常奇怪的问题.似乎只有在将具有很高预测能力的逐步回归包装到函数中时,才会出现此问题.

我一直为此苦苦挣扎,任何帮助将不胜感激.我正在尝试编写一个运行多个逐步回归并将其全部输出到列表的函数.但是,R无法读取我在函数参数中指定的数据集.我在各个板上都发现了一些类似的错误(在这里此处

I have been struggling with this for a while and any help would be much appreciated. I am trying to write a function that runs several stepwise regressions and outputs all of them to a list. However, R is having trouble reading the dataset that I specify in my function arguments. I found several similar errors on various boards (here, here, and here), however none of them seemed to ever get resolved. It all comes down to some weird issues with calling step() in a user-defined function. I am using the following script to test my code. Run the whole thing several times until an error arises (trust me, it will):

test.df <- data.frame(a = sample(0:1, 100, rep = T),
                      b = as.factor(sample(0:5, 100, rep = T)),
                      c = runif(100, 0, 100),
                      d = rnorm(100, 50, 50))
test.df$b[10:100] <- test.df$a[10:100] #making sure that at least one of the variables has some predictive power

stepModel <- function(modeling.formula, dataset, outfile = NULL) {
  if (is.null(outfile) == FALSE){
    sink(file = outfile,
         append = TRUE, type = "output")
    print("")
    print("Models run at:")
    print(Sys.time())
  }
  model.initial <- glm(modeling.formula,
                       family = binomial,
                       data = dataset)
  model.stepwise1 <- step(model.initial, direction = "backward")
  model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
  output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
  sink()
  return(output)
}

blah <- stepModel(a~., dataset = test.df)

这将返回以下错误消息(如果错误没有立即显示出来,请继续运行test.df脚本以及对stepModel()的调用,它将最终显示出来):

This returns the following error message (if the error does not show up right away, keep re-running the test.df script as well as the call for stepModel(), it will show up eventually):

Error in is.data.frame(data) : object 'dataset' not found

我已确定一切正常,直到model.stepwise2开始构建为止.不知何故,临时对象数据集"在第一个逐步回归中可以正常工作,但在第二个逐步回归中却无法识别.我通过注释掉部分函数发现了这一点,如下所示.这段代码可以正常运行,证明对象数据集"最初是可以被识别的:

I have determined that everything runs fine up until model.stepwise2 starts to get built. Somehow, the temporary object 'dataset' works just fine for the first stepwise regression, but fails to be recognized by the second. I found this by commenting out part of the function as can be seen below. This code will run fine, proving that the object 'dataset' was originally being recognized:

stepModel1 <- function(modeling.formula, dataset, outfile = NULL) {
  if (is.null(outfile) == FALSE){
    sink(file = outfile,
         append = TRUE, type = "output")
    print("")
    print("Models run at:")
    print(Sys.time())
  }
  model.initial <- glm(modeling.formula,
                       family = binomial,
                       data = dataset)
  model.stepwise1 <- step(model.initial, direction = "backward")
#   model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
#   sink()
#   output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
  return(model.stepwise1)
}

blah1 <- stepModel1(a~., dataset = test.df) 

编辑-在有人问之前,所有的summary()函数都在那儿,因为完整的函数(我对其进行了编辑,以便您可以专注于错误)还有另一个定义文件的功能.您可以输出逐步跟踪.我刚刚摆脱了他们

EDIT - before anyone asks, all the summary() functions were there because the full function (i edited it so that you could focus in on the error) has another piece that defines a file to which you can output stepwise trace. I just got rid of them

编辑2 -会话信息

sessionInfo() R版本2.15.1(2012-06-22) 平台:x86_64-pc-mingw32/x64(64位)

sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] tcltk     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sqldf_0.4-6.4         RSQLite.extfuns_0.0.1 RSQLite_0.11.3        chron_2.3-43         
 [5] gsubfn_0.6-5          proto_0.3-10          DBI_0.2-6             ggplot2_0.9.3.1      
 [9] caret_5.15-61         reshape2_1.2.2        lattice_0.20-6        foreach_1.4.0        
[13] cluster_1.14.2        plyr_1.8             

loaded via a namespace (and not attached):
 [1] codetools_0.2-8    colorspace_1.2-1   dichromat_2.0-0    digest_0.6.2       grid_2.15.1       
 [6] gtable_0.1.2       iterators_1.0.6    labeling_0.1       MASS_7.3-18        munsell_0.4       
[11] RColorBrewer_1.0-5 scales_0.2.3       stringr_0.6.2      tools_2.15

编辑3 -无需执行功能即可执行与该功能相同的所有操作.即使算法无法收敛,每次都可以很好地运行:

EDIT 3 - this performs all the same operations as the function, just without using a function. This will run fine every time, even when the algorithm doesn't converge:

modeling.formula <- a~.
dataset <- test.df
outfile <- NULL
if (is.null(outfile) == FALSE){
  sink(file = outfile,
       append = TRUE, type = "output")
  print("")
  print("Models run at:")
  print(Sys.time())
}
  model.initial <- glm(modeling.formula,
                       family = binomial,
                       data = dataset)
  model.stepwise1 <- step(model.initial, direction = "backward")
  model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
  output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)

推荐答案

使用do.call引用调用环境中的数据集对我有用.有关原始建议,请参见 https://stackoverflow.com/a/7668846/210673 .这是一个有效的版本(删除了sink代码).

Using do.call to refer to the data set in the calling environment works for me. See https://stackoverflow.com/a/7668846/210673 for the original suggestion. Here's a version that works (with sink code removed).

stepModel2 <- function(modeling.formula, dataset) {
  model.initial <- do.call("glm", list(modeling.formula,
                       family = "binomial",
                       data = as.name(dataset)))
  model.stepwise1 <- step(model.initial, direction = "backward")
  model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
  list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
}

blah <- stepModel2(a~., dataset = "test.df")

对于我来说,与set.seed(6)一起使用原始代码始终失败.失败的原因是dataset变量不存在于step函数中,尽管在创建model.stepwise1时不需要此变量,但当model.stepwise1保持线性项时,对于model.stepwise2来说是必需的.这就是您的版本失败时的情况.如我在此处所述,从全局环境中调用数据集可以解决此问题.

It fails for me consistently with set.seed(6) with the original code. The reason it fails is that the dataset variable is not present within the step function, and although it's not needed in making model.stepwise1, it is needed for model.stepwise2 when model.stepwise1 keeps a linear term. So that's the case when your version fails. Calling the dataset from the global environment as I do here fixes this issue.

这篇关于在用户定义的函数中使用step()时缺少对象错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆