R-model.frame()和非标准评估 [英] R - model.frame() and non-standard evaluation

查看:152
本文介绍了R-model.frame()和非标准评估的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对尝试编写的函数的行为感到困惑。我的示例来自 survival 软件包,但我认为问题比这更笼统。基本上,以下代码

I am puzzled at a behaviour of a function that I am trying to write. My example comes from the survival package but I think that the question is more general than that. Basically, the following code

library(survival)
data(bladder)  ## this will load "bladder", "bladder1" and "bladder2"

mod_init <- coxph(Surv(start, stop, event) ~ rx + number, data = bladder2, method = "breslow")
survfit(mod_init)

将产生我感兴趣的对象。但是,当我在函数中编写它时,

Will yield an object that I am interested in. However, when I write it in a function,

my_function <- function(formula, data) {
  mod_init <- coxph(formula = formula, data = data, method = "breslow")
  survfit(mod_init)
  }

my_function(Surv(start, stop, event) ~ rx + number, data = bladder2)

该函数将在最后一行返回错误:

the function will return an error at the last line:

 Error in eval(predvars, data, env) : 
  invalid 'envir' argument of type 'closure' 
10 eval(predvars, data, env) 
9 model.frame.default(formula = Surv(start, stop, event) ~ rx + 
    number, data = data) 
8 stats::model.frame(formula = Surv(start, stop, event) ~ rx + 
    number, data = data) 
7 eval(expr, envir, enclos) 
6 eval(temp, environment(formula$terms), parent.frame()) 
5 model.frame.coxph(object) 
4 stats::model.frame(object) 
3 survfit.coxph(mod_init) 
2 survfit(mod_init) 
1 my_function(Surv(start, stop, event) ~ rx + number, data = bladder2) 

我很好奇我是否缺少明显的东西,或者这种行为是否正常。我觉得很奇怪,因为在 my_function 环境中,当运行代码的第一部分时,我将具有与全局环境中相同的对象。

I am curious whether there is something obvious that I am missing or whether such behaviour is normal. I find it strange, since in the environment of my_function I would have the same objects as in the global environment when running the first portion of the code.

编辑:我还收到了Terry Therneau的有用信息,Terry Therneau是 survival 软件包的作者。这是他的答案:

I also received useful input from Terry Therneau, the author of the survival package. This is his answer:

这是一个问题,它源于model.frame进行的非标准评估。我发现的唯一方法是将model.frame = TRUE添加到原始coxph调用中。我认为这是R中的一个严重设计缺陷。非标准评估就像阴暗面一样-诱人而又容易的方法总是会以糟糕的结果结束。
Terry T。

推荐答案

诊断

从错误消息:

2 survfit(mod_init, newdata = base_case)
1 my_function(Surv(start, stop, event) ~ rx + number, data = bladder2) 

问题显然不是模型拟合期间的 coxph ,而是 survfit

the problem is clearly not with coxph during model fitting, but with survfit.

然后从此消息中获得

10 eval(predvars, data, env) 
 9 model.frame.default(formula = Surv(start, stop, event) ~ rx + 
     number, data = data) 

我可以说问题是在 survfit 的早期,函数 model.frame.default()找不到包含在公式 Surv(开始,停止,事件)〜rx +数字中使用的相关数据的模型框架

I can tell that the problem is that during early stage of survfit, the function model.frame.default() can not find a model frame containing relevant data used in formula Surv(start, stop, event) ~ rx + number. Hence it complains.

什么是模型框架?

由传递给拟合例程的数据参数形成模型框架,如 lm() glm() mgcv ::: gam()。它是与 data 具有相同行数的数据帧,但是:

A model frame, is formed from the data argument passed to fitting routine, like lm(), glm() and mgcv:::gam(). It is a data frame with the same number of rows as data, but:


  • 删除所有公式未引用的变量

  • 添加许多属性,其中最重要的是 envrionement

  • dropping all variables not referenced by formula
  • adding many attributes, the most important of which is envrionement

大多数模型拟合例程,例如 lm() glm() mgcv ::: gam(),会将模型框架保留在适合的对象中默认。这样做的好处是,如果我们以后调用 predict ,而没有提供 newdata ,它将从该模型框架中找到数据进行评估。但是,一个明显的缺点是,它将大大增加装配对象的大小。

Most model fitting routines, like lm(), glm(), and mgcv:::gam(), will keep the model frame in their fitted object by default. This has advantage that if we later call predict, and no newdata is provided, it will find data from this model frame for evaluation. However, a clear disadvantage is that it will substantially increase the size of your fitted object.

但是, survival ::: coxph()是一个例外。默认情况下,它将将这种模型框架保留在其适合的对象中。好吧,很明显,这使生成的装配对象的尺寸小得多,但是使您面临所遇到的问题。 如果我们想让 survival ::: coxph()保持此模型框架,则使用 model = TRUE

However, survival:::coxph() is an exception. It will by default not retain such model frame in their fitted object. Well, clearly, this makes the resulting fitted object much smaller in size, but, expose you to the problem you have encountered. If we want to ask survival:::coxph() to keep this model frame, then use model = TRUE of this function.

使用 survial进行测试:: :coxph()

Test with survial:::coxph()

library(survival); data(bladder)

my_function <- function(myformula, mydata, keep.mf = TRUE) {
  fit <- coxph(myformula, mydata, method = "breslow", model = keep.mf)
  survfit(fit)
  }

现在,这个如您所见,函数调用将失败:

Now, this function call will fail, as you have seen:

my_function(Surv(start, stop, event) ~ rx + number, bladder2, keep.mf = FALSE)

但此函数调用将成功:

my_function(Surv(start, stop, event) ~ rx + number, bladder2, keep.mf = TRUE)






lm()


Same behaviour for lm()

我们实际上可以在 lm()中演示相同的行为:

We can actually demonstrate the same behaviour in lm():

## generate some toy data
foo <- data.frame(x = seq(0, 1, length = 20), y = seq(0, 1, length = 20) + rnorm(20, 0, 0.15))

## a wrapper function
bar <- function(myformula, mydata, keep.mf = TRUE) {
  fit <- lm(myformula, mydata, model = keep.mf)
  predict.lm(fit)
  }

现在,通过保留模型框架,这将成功:

Now this will succeed, by keeping model frame:

bar(y ~ x - 1, foo, keep.mf = TRUE)

这会失败,通过丢弃模型框架:

while this will fail, by discarding model frame:

bar(y ~ x - 1, foo, keep.mf = FALSE)






使用参数 newdata


Using argument newdata?

请注意,我对 lm()的示例有点虚构,因为我们实际上可以在 predict.lm()中使用 newdata 自变量来解决此问题:

Note that my example for lm() is slightly artificial, because we can actually use newdata argument in predict.lm() to get through this problem:

bar1 <- function(myformula, mydata, keep.mf = TRUE) {
  fit <- lm(myformula, mydata, model = keep.mf)
  predict.lm(fit, newdata = lapply(mydata, mean))
  }

现在无论我们是否保留模型框架,两者都将成功:

Now whether we keep model frame, both will succeed:

bar1(y ~ x - 1, foo, keep.mf = TRUE)
bar1(y ~ x - 1, foo, keep.mf = FALSE)

那么您可能想知道:我们可以为 survfit()做同样的事情吗? ?

Then you may wonder: can we do the same for survfit()?

survfit()是一个泛型函数,在您的代码中,您实际上是在调用 survfit.coxph()。实际上,此函数有一个 newdata 参数。该文档显示为:

survfit() is a generic function, in your code, you are really calling survfit.coxph(). There is indeed a newdata argument for this function. The documentation reads:


newdata:

newdata:

具有相同变量的数据框名称与出现在
'coxph'公式中的名称相同。 ... ...默认值是
'coxph'拟合中使用的协变量的平均值。

a data frame with the same variable names as those that appear in the ‘coxph’ formula. ... ... Default is the mean of the covariates used in the ‘coxph’ fit.

让我们尝试:

my_function1 <- function(myformula, mydata) {
  mtrace.off()
  fit <- coxph(myformula, mydata, method = "breslow")
  survival:::survfit.coxph(fit, newdata = lapply(mydata, mean))
  }

,我们希望这项工作:

my_function1(Surv(start, stop, event) ~ rx + number, bladder2)

但是:

Error in is.data.frame(data) (from #5) : object 'mydata' not found

1: my_function1(Surv(start, stop, event) ~ rx + number, bladder2)
2: #5: survival:::survfit.coxph(fit, lapply(mydata, mean))
3: stats::model.frame(object)
4: model.frame.coxph(object)
5: eval(temp, environment(formula$terms), parent.frame())
6: eval(expr, envir, enclos)
7: stats::model.frame(formula = Surv(start, stop, event) ~ rx + number, data =
8: model.frame.default(formula = Surv(start, stop, event) ~ rx + number, data 
9: is.data.frame(data)

请注意,尽管我们传入了 newdata ,它不用于构建模型框架:

Note that although we pass in newdata, it is not used in construction of model frame:

3: stats::model.frame(object)

object ,拟合模型的副本,传递给 model.frame.default()

Only object, a copy of fitted model, is passed to model.frame.default().

这与 predict.lm() predict中发生的情况非常不同。 glm() mgcv ::: predict.gam()。在这些例程中,将 newdata 传递给 model.frame.default()。例如,在 lm()中,有:

This is very different from what happens in predict.lm(), predict.glm() and mgcv:::predict.gam(). In these routines, newdata is passed to model.frame.default(). For example, in lm(), there is:

m <- model.frame(Terms, newdata, na.action = na.action, xlev = object$xlevels)

我不使用 survival 软件包,因此不确定 newdata 在此软件包中的工作方式。因此,我认为我们真的需要一些专家来对此进行解释。

I don't use survival package, so not sure how newdata works in this package. So I think we really need some expert explaining this.

这篇关于R-model.frame()和非标准评估的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆