purrr:map和glm-通话问题 [英] purrr:map and glm - issues with call
问题描述
purrr:map非常适合子组分析和/或模型比较。但是,当使用 glm
时,呼叫陷入混乱并引起问题,例如在计算伪R2时。原因是 update
不适用于难看的 call
,因此不能用于 pscl: :pR2
无法计算基本模型的对数似然性。
pacman :: p_load(tidyverse)
#示例数据
pacman :: p_load(ISLR)
mydata = ISLR :: Default
#nest数据,学生和非学生
Default_nested =默认%>%group_by(学生)%&%%嵌套
#fit glms
formul =默认〜收入+余额
glms = Default_nested%>%
mutate(model = map(data,glm,formula = formul,family ='binomial'))
#pscl: :pR2抛出错误
pacman :: p_load(pscl)
glms%>%mutate(pr2 = map(model,pR2))
现在我们可以看一下第一个子模型。即使公式包含正确的公式,该调用也看起来很奇怪(公式= .. 1)。
> glms $ model [[1]] $ call
.f(公式= ..1,家庭=二项式,数据= .x [[i]])
> glms $ model [[1]] $公式
默认值〜收入+余额
> glms $ model [[1]] $ data
#小技巧:7,056 x 3
默认余额收入
< fct> < dbl> < dbl>
1否730。44362.
使用pscl最干净的方法是什么:: pR2,如果您的小标题中有许多glm对象(在此示例中为2个以上)?
编辑:
解决方案策略概述:
(A) 修复 glm对象,以便可以对其应用 update
:
glms%>%mutate(model = map(model,function(x){x $ call = call2( glm,formula = x $ formula,data = quote(Default),family = 'binomial'); x}))%>%
mutate(pr2 = map(model,pR2))%>%unnest(pr2)
此运行,但是计算的R2关闭。因此,该解决方案策略可能是死路一条。
(B)为`glm写 wrapper ,如Artem所建议。这应该工作正常。缺点:调用看起来很丑。
我扩展了Artem提出的解决方案,以创建 glm3
。
glm3<-函数(公式,数据,家庭){
eval(rlang :: expr( glm(!! rlang :: enexpr(data),
Formula = !! formula,
family = !! family))))}
glms3<-Default_nested%> ;%mutate(model = map(data,glm3,formula = formul,family ='binomial'),pr2 = map(model,pR2))
glms3%>%unnest(pr2)
(C)在这种情况下(伪R2),只需编写一个更好的伪R2函数。由于它可能是在purrr :: map中不起作用的唯一主要统计数据,因此这实际上是有道理的。我将 psr2glm
函数放在一起。
psr2glm = function(glmobj){
L.base =
logLik(
glm(formula = Reformulate('1',gsub(。* $, ,deparse(glmobj $ formula))),
data = glmobj $ data,
family = glmobj $ family))
n = length(glmobj $ residuals)
L.full = logLik(glmobj)
D.full<---2 * L.full
D.base<---2 * L.base
G2< ;--2 *(L.base-L.full)
return(data.frame(McFadden = 1-L.full / L.base,
CoxSnell = 1-exp( -G2 / n),
Nagelkerke =(1-exp((D.full-D.base)/ n))/(1-exp(-D.base / n))))
}
有效:
< pre class = lang-r prettyprint-override>
glms = Default_nested%>%
mutate(model = map(data,glm,formula = formul,family ='binomial'))
glms%>%mutate(pr2 = map(model,psr2glm))%>%unnest(pr 2)
我考虑提议对DescTools ::: PseudoR2进行更改,但是,我首先需要检查是否
这个想法的关键是跳过 update
而是直接调用 glm
。所有必需的信息都在glm对象中,甚至在purrr :: map中也是如此。
使用psr2glm有很好的副作用:嵌套的输出看起来不错。
(D)更改任一 glm
或更新
。鉴于glm对象实际上包含所有必要的信息,因此可以将观察到的行为视为错误。
一种方法是为 glm( )
通过手动构造表达式然后对其求值将数据直接放入调用中:
glm2 <-function(.df,...){
eval(rlang :: expr(glm(!! rlang :: enexpr(.df),!!! list(...))))}}
glms<-Default_nested%>%
mutate(model = map(data,glm2,formula = formul,family = binomial),
pr2 = map( model,pscl :: pR2))
##小技巧:2 x 4
#学生数据模型pr2
#< fct> < list> < list> < list>
#1否< tibble [7,056×3]> < glm> < dbl [6]>
#2是< tibble [2,944×3]> < glm> < dbl [6]>
验证:
##手动执行计算,并确保与glms $ pr2
glm(Default_nested $ data [[1]],公式= default〜income + balance相同) ,family = binomial)%&%;%
pscl :: pR2()%&%;%same(glms $ pr2 [[1]])#TRUE
glm(Default_nested $ data [[2 ]],公式=默认值〜收入+余额,家庭=二项式)%&%;%
pscl :: pR2()%>%同一(glms $ pr2 [[2]])#是$ b
This issue is related to Pipe '.' dot causes trouble in glm call.
purrr:map is wonderful for subgroup analysis and/or model comparison. However, when using glm
, the call is messed up and causing issues, e.g. when computing pseudo-R2s. The reason is that update
doesn't work with the ugly call
, and thus pscl::pR2
cannot compute the log-likelihood of the base model.
pacman::p_load(tidyverse)
#sample data
pacman::p_load(ISLR)
mydata = ISLR::Default
#nest data, students and non-students
Default_nested = Default %>% group_by(student) %>% nest
#fit glms
formul= default ~income+balance
glms = Default_nested %>%
mutate(model=map(data,glm,formula=formul,family='binomial'))
#pscl::pR2 throwing error
pacman::p_load(pscl)
glms %>% mutate(pr2=map(model,pR2))
Now we can take a look at the first submodel. The call looks strange (formula=..1) even though formula contains the right formula.
> glms$model[[1]]$call
.f(formula = ..1, family = "binomial", data = .x[[i]])
> glms$model[[1]]$formula
default ~ income + balance
> glms$model[[1]]$data
# A tibble: 7,056 x 3
default balance income
<fct> <dbl> <dbl>
1 No 730. 44362.
What is the cleanest way to be able to use pscl::pR2 when you have many (more than 2 in this example) glm objects in your tibble?
Edit:
Overview of solution strategies:
(A) "fix" the glm object, so that update
can be applied to it:
glms %>% mutate(model = map(model,function(x){x$call = call2("glm",formula=x$formula,data=quote(Default),family='binomial');x})) %>%
mutate(pr2=map(model,pR2)) %>% unnest(pr2)
This 'runs', however, the computed R2 is off. So this solution strategy is probably a dead-end.
(B) Write a wrapper for `glm, as proposed by Artem. This should work fine. Downside: the calls look ugly.
I expanded on Artem's proposed solution to create glm3
.
glm3 <- function(formula,data,family) {
eval(rlang::expr( glm(!!rlang::enexpr(data),
formula=!!formula,
family=!!family ) ))}
glms3 <- Default_nested %>% mutate( model=map(data,glm3,formula=formul,family='binomial'),pr2=map(model,pR2) )
glms3 %>% unnest(pr2)
(C) In this particular case (pseudo R2s), simply write a better pseudo-r2 function. Since it's probably the only major statistic that doesn't work within purrr::map, this may actually make sense. I put together the psr2glm
function.
psr2glm=function(glmobj){
L.base=
logLik(
glm(formula = reformulate('1',gsub( " .*$", "", deparse(glmobj$formula) )),
data=glmobj$data,
family = glmobj$family))
n=length(glmobj$residuals)
L.full=logLik(glmobj)
D.full <- -2 * L.full
D.base <- -2 * L.base
G2 <- -2 * (L.base - L.full)
return(data.frame(McFadden = 1-L.full/L.base,
CoxSnell = 1 - exp(-G2/n),
Nagelkerke = (1 - exp((D.full - D.base)/n))/(1 - exp(-D.base/n))))
}
It works:
glms = Default_nested %>%
mutate(model=map(data,glm,formula=formul,family='binomial'))
glms %>% mutate(pr2=map(model,psr2glm)) %>% unnest(pr2)
I consider proposing changes to DescTools:::PseudoR2, however, I first need to check if the solution is general.
The key to this idea is to skip update
and instead directly call glm
. All required information pieces are within the glm object, even within purrr::map.
Nice side effect of using psr2glm: unnest's output looks nice.
(D) Change either glm
or update
. Given that the glm object actually contains all necessary information, one could consider the observed behavior a bug. So it should be fixed in base R.
One way is to define a wrapper for glm()
that puts data directly inside the call by manually constructing the expression and then evaluating it:
glm2 <- function(.df, ...) {
eval(rlang::expr(glm(!!rlang::enexpr(.df),!!!list(...)))) }
glms <- Default_nested %>%
mutate( model = map(data,glm2,formula=formul,family="binomial"),
pr2 = map(model,pscl::pR2) )
# # A tibble: 2 x 4
# student data model pr2
# <fct> <list> <list> <list>
# 1 No <tibble [7,056 × 3]> <glm> <dbl [6]>
# 2 Yes <tibble [2,944 × 3]> <glm> <dbl [6]>
Validation:
## Perform the computation by hand and ensure that it's identical to glms$pr2
glm(Default_nested$data[[1]], formula=default~income+balance, family="binomial") %>%
pscl::pR2() %>% identical( glms$pr2[[1]] ) # TRUE
glm(Default_nested$data[[2]], formula=default~income+balance, family="binomial") %>%
pscl::pR2() %>% identical( glms$pr2[[2]] ) # TRUE
这篇关于purrr:map和glm-通话问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!