purrr:map和glm-通话问题 [英] purrr:map and glm - issues with call

查看:67
本文介绍了purrr:map和glm-通话问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题与管道'。点会在glm调用中引起麻烦。



purrr:map非常适合子组分析和/或模型比较。但是,当使用 glm 时,呼叫陷入混乱并引起问题,例如在计算伪R2时。原因是 update 不适用于难看的 call ,因此不能用于 pscl: :pR2 无法计算基本模型的对数似然性。

  pacman :: p_load(tidyverse)

#示例数据
pacman :: p_load(ISLR)
mydata = ISLR :: Default

#nest数据,学生和非学生
Default_nested =默认%>%group_by(学生)%&%%嵌套

#fit glms
formul =默认〜收入+余额

glms = Default_nested%>%
mutate(model = map(data,glm,formula = formul,family ='binomial'))

#pscl: :pR2抛出错误
pacman :: p_load(pscl)
glms%>%mutate(pr2 = map(model,pR2))

现在我们可以看一下第一个子模型。即使公式包含正确的公式,该调用也看起来很奇怪(公式= .. 1)。

 > glms $ model [[1]] $ call 
.f(公式= ..1,家庭=二项式,数据= .x [[i]])
> glms $ model [[1]] $公式
默认值〜收入+余额
> glms $ model [[1]] $ data
#小技巧:7,056 x 3
默认余额收入
< fct> < dbl> < dbl>
1否730。44362.

使用pscl最干净的方法是什么:: pR2,如果您的小标题中有许多glm对象(在此示例中为2个以上)?



编辑:



解决方案策略概述:



(A) 修复 glm对象,以便可以对其应用 update

  glms%>%mutate(model = map(model,function(x){x $ call = call2( glm,formula = x $ formula,data = quote(Default),family = 'binomial'); x}))%>%
mutate(pr2 = map(model,pR2))%>%unnest(pr2)

此运行,但是计算的R2关闭。因此,该解决方案策略可能是死路一条。



(B)为`glm写 wrapper ,如Artem所建议。这应该工作正常。缺点:调用看起来很丑。



我扩展了Artem提出的解决方案,以创建 glm3

  glm3<-函数(公式,数据,家庭){
eval(rlang :: expr( glm(!! rlang :: enexpr(data),
Formula = !! formula,
family = !! family))))}

glms3<-Default_nested%> ;%mutate(model = map(data,glm3,formula = formul,family ='binomial'),pr2 = map(model,pR2))
glms3%>%unnest(pr2)

(C)在这种情况下(伪R2),只需编写一个更好的伪R2函数。由于它可能是在purrr :: map中不起作用的唯一主要统计数据,因此这实际上是有道理的。我将 psr2glm 函数放在一起。

  psr2glm = function(glmobj){

L.base =
logLik(
glm(formula = Reformulate('1',gsub(。* $, ,deparse(glmobj $ formula))),
data = glmobj $ data,
family = glmobj $ family))

n = length(glmobj $ residuals)

L.full = logLik(glmobj)
D.full<---2 * L.full
D.base<---2 * L.base
G2< ;--2 *(L.base-L.full)

return(data.frame(McFadden = 1-L.full / L.base,
CoxSnell = 1-exp( -G2 / n),
Nagelkerke =(1-exp((D.full-D.base)/ n))/(1-exp(-D.base / n))))

}

有效:



< pre class = lang-r prettyprint-override> glms = Default_nested%>%
mutate(model = map(data,glm,formula = formul,family ='binomial'))
glms%>%mutate(pr2 = map(model,psr2glm))%>%unnest(pr 2)

我考虑提议对DescTools ::: PseudoR2进行更改,但是,我首先需要检查是否



这个想法的关键是跳过 update 而是直接调用 glm 。所有必需的信息都在glm对象中,甚至在purrr :: map中也是如此。
使用psr2glm有很好的副作用:嵌套的输出看起来不错。



(D)更改任一 glm 更新。鉴于glm对象实际上包含所有必要的信息,因此可以将观察到的行为视为错误。

解决方案

一种方法是为 glm( )通过手动构造表达式然后对其求值将数据直接放入调用中:

  glm2 <-function(.df,...){
eval(rlang :: expr(glm(!! rlang :: enexpr(.df),!!! list(...))))}}

glms<-Default_nested%>%
mutate(model = map(data,glm2,formula = formul,family = binomial),
pr2 = map( model,pscl :: pR2))
##小技巧:2 x 4
#学生数据模型pr2
#< fct> < list> < list> < list>
#1否< tibble [7,056×3]> < glm> < dbl [6]>
#2是< tibble [2,944×3]> < glm> < dbl [6]>

验证:

  ##手动执行计算,并确保与glms $ pr2 
glm(Default_nested $ data [[1]],公式= default〜income + balance相同) ,family = binomial)%&%;%
pscl :: pR2()%&%;%same(glms $ pr2 [[1]])#TRUE
glm(Default_nested $ data [[2 ]],公式=默认值〜收入+余额,家庭=二项式)%&%;%
pscl :: pR2()%>%同一(glms $ pr2 [[2]])#是$ b


This issue is related to Pipe '.' dot causes trouble in glm call.

purrr:map is wonderful for subgroup analysis and/or model comparison. However, when using glm, the call is messed up and causing issues, e.g. when computing pseudo-R2s. The reason is that update doesn't work with the ugly call, and thus pscl::pR2 cannot compute the log-likelihood of the base model.

pacman::p_load(tidyverse)

#sample data
pacman::p_load(ISLR)
mydata = ISLR::Default

#nest data, students and non-students
Default_nested = Default %>% group_by(student) %>% nest 

#fit glms
formul= default ~income+balance

glms = Default_nested %>% 
  mutate(model=map(data,glm,formula=formul,family='binomial')) 

#pscl::pR2 throwing error
pacman::p_load(pscl)
glms %>% mutate(pr2=map(model,pR2))

Now we can take a look at the first submodel. The call looks strange (formula=..1) even though formula contains the right formula.

> glms$model[[1]]$call
.f(formula = ..1, family = "binomial", data = .x[[i]])
> glms$model[[1]]$formula
default ~ income + balance
> glms$model[[1]]$data
# A tibble: 7,056 x 3
   default balance income
   <fct>     <dbl>  <dbl>
 1 No         730. 44362.

What is the cleanest way to be able to use pscl::pR2 when you have many (more than 2 in this example) glm objects in your tibble?

Edit:

Overview of solution strategies:

(A) "fix" the glm object, so that update can be applied to it:

glms %>% mutate(model = map(model,function(x){x$call = call2("glm",formula=x$formula,data=quote(Default),family='binomial');x})) %>%
  mutate(pr2=map(model,pR2)) %>% unnest(pr2)

This 'runs', however, the computed R2 is off. So this solution strategy is probably a dead-end.

(B) Write a wrapper for `glm, as proposed by Artem. This should work fine. Downside: the calls look ugly.

I expanded on Artem's proposed solution to create glm3.

glm3 <- function(formula,data,family) {  
  eval(rlang::expr( glm(!!rlang::enexpr(data),
                        formula=!!formula,
                        family=!!family ) ))}

glms3 <- Default_nested %>% mutate( model=map(data,glm3,formula=formul,family='binomial'),pr2=map(model,pR2) )
glms3 %>% unnest(pr2)

(C) In this particular case (pseudo R2s), simply write a better pseudo-r2 function. Since it's probably the only major statistic that doesn't work within purrr::map, this may actually make sense. I put together the psr2glm function.

psr2glm=function(glmobj){

  L.base=
    logLik(
      glm(formula = reformulate('1',gsub( " .*$", "", deparse(glmobj$formula) )),
          data=glmobj$data,
          family = glmobj$family))

  n=length(glmobj$residuals)

  L.full=logLik(glmobj)
  D.full <- -2 * L.full
  D.base <- -2 * L.base
  G2 <- -2 * (L.base - L.full)

  return(data.frame(McFadden = 1-L.full/L.base, 
                    CoxSnell = 1 - exp(-G2/n),
                    Nagelkerke = (1 - exp((D.full - D.base)/n))/(1 - exp(-D.base/n))))

}

It works:

glms = Default_nested %>% 
  mutate(model=map(data,glm,formula=formul,family='binomial')) 
glms %>% mutate(pr2=map(model,psr2glm)) %>% unnest(pr2)

I consider proposing changes to DescTools:::PseudoR2, however, I first need to check if the solution is general.

The key to this idea is to skip update and instead directly call glm. All required information pieces are within the glm object, even within purrr::map. Nice side effect of using psr2glm: unnest's output looks nice.

(D) Change either glm or update. Given that the glm object actually contains all necessary information, one could consider the observed behavior a bug. So it should be fixed in base R.

解决方案

One way is to define a wrapper for glm() that puts data directly inside the call by manually constructing the expression and then evaluating it:

glm2 <- function(.df, ...) {
  eval(rlang::expr(glm(!!rlang::enexpr(.df),!!!list(...)))) }

glms <- Default_nested %>%
    mutate( model = map(data,glm2,formula=formul,family="binomial"),
            pr2   = map(model,pscl::pR2) )
# # A tibble: 2 x 4
#   student data                 model  pr2      
#   <fct>   <list>               <list> <list>   
# 1 No      <tibble [7,056 × 3]> <glm>  <dbl [6]>
# 2 Yes     <tibble [2,944 × 3]> <glm>  <dbl [6]>

Validation:

## Perform the computation by hand and ensure that it's identical to glms$pr2
glm(Default_nested$data[[1]], formula=default~income+balance, family="binomial") %>%
  pscl::pR2() %>% identical( glms$pr2[[1]] )     # TRUE
glm(Default_nested$data[[2]], formula=default~income+balance, family="binomial") %>%
  pscl::pR2() %>% identical( glms$pr2[[2]] )     # TRUE

这篇关于purrr:map和glm-通话问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆