purrr:map 和 glm - 通话问题 [英] purrr:map and glm - issues with call

查看:10
本文介绍了purrr:map 和 glm - 通话问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题与管道'.'有关.点会导致 glm 调用出现问题.

purrr:map 非常适合亚组分析和/或模型比较.但是,当使用 glm 时,调用会混乱并导致问题,例如在计算伪 R2 时.原因是 update 不适用于丑陋的 call,因此 pscl::pR2 无法计算基的对数似然模型.

purrr:map is wonderful for subgroup analysis and/or model comparison. However, when using glm, the call is messed up and causing issues, e.g. when computing pseudo-R2s. The reason is that update doesn't work with the ugly call, and thus pscl::pR2 cannot compute the log-likelihood of the base model.

pacman::p_load(tidyverse)

#sample data
pacman::p_load(ISLR)
mydata = ISLR::Default

#nest data, students and non-students
Default_nested = Default %>% group_by(student) %>% nest 

#fit glms
formul= default ~income+balance

glms = Default_nested %>% 
  mutate(model=map(data,glm,formula=formul,family='binomial')) 

#pscl::pR2 throwing error
pacman::p_load(pscl)
glms %>% mutate(pr2=map(model,pR2))

现在我们可以看看第一个子模型.即使公式包含正确的公式,调用看起来也很奇怪 (formula=..1).

Now we can take a look at the first submodel. The call looks strange (formula=..1) even though formula contains the right formula.

> glms$model[[1]]$call
.f(formula = ..1, family = "binomial", data = .x[[i]])
> glms$model[[1]]$formula
default ~ income + balance
> glms$model[[1]]$data
# A tibble: 7,056 x 3
   default balance income
   <fct>     <dbl>  <dbl>
 1 No         730. 44362.

当您的 tibble 中有很多(在本例中超过 2 个)glm 对象时,能够使用 pscl::pR2 的最简洁方法是什么?

What is the cleanest way to be able to use pscl::pR2 when you have many (more than 2 in this example) glm objects in your tibble?

解决方案策略概述:

(A) 修复" glm 对象,以便 update 可以应用于它:

(A) "fix" the glm object, so that update can be applied to it:

glms %>% mutate(model = map(model,function(x){x$call = call2("glm",formula=x$formula,data=quote(Default),family='binomial');x})) %>%
  mutate(pr2=map(model,pR2)) %>% unnest(pr2)

这个运行",然而,计算出的 R2 是关闭的.所以这个解决策略很可能是死胡同.

This 'runs', however, the computed R2 is off. So this solution strategy is probably a dead-end.

(B) 按照 Artem 的建议,为 `glm 编写一个 包装器.这应该可以正常工作.缺点:通话看起来很难看.

(B) Write a wrapper for `glm, as proposed by Artem. This should work fine. Downside: the calls look ugly.

我扩展了 Artem 提出的解决方案以创建 glm3.

I expanded on Artem's proposed solution to create glm3.

glm3 <- function(formula,data,family) {  
  eval(rlang::expr( glm(!!rlang::enexpr(data),
                        formula=!!formula,
                        family=!!family ) ))}

glms3 <- Default_nested %>% mutate( model=map(data,glm3,formula=formul,family='binomial'),pr2=map(model,pR2) )
glms3 %>% unnest(pr2)

(C) 在这种特殊情况下(伪 R2),只需编写一个更好的 伪 r2 函数.由于它可能是 purrr::map 中唯一不起作用的主要统计数据,因此这实际上可能是有道理的.我把 psr2glm 函数放在一起.

(C) In this particular case (pseudo R2s), simply write a better pseudo-r2 function. Since it's probably the only major statistic that doesn't work within purrr::map, this may actually make sense. I put together the psr2glm function.

psr2glm=function(glmobj){

  L.base=
    logLik(
      glm(formula = reformulate('1',gsub( " .*$", "", deparse(glmobj$formula) )),
          data=glmobj$data,
          family = glmobj$family))

  n=length(glmobj$residuals)

  L.full=logLik(glmobj)
  D.full <- -2 * L.full
  D.base <- -2 * L.base
  G2 <- -2 * (L.base - L.full)

  return(data.frame(McFadden = 1-L.full/L.base, 
                    CoxSnell = 1 - exp(-G2/n),
                    Nagelkerke = (1 - exp((D.full - D.base)/n))/(1 - exp(-D.base/n))))

}

它有效:

glms = Default_nested %>% 
  mutate(model=map(data,glm,formula=formul,family='binomial')) 
glms %>% mutate(pr2=map(model,psr2glm)) %>% unnest(pr2)

我考虑对 DescTools::PseudoR2 提出更改,但是,我首先需要检查解决方案是否通用.

I consider proposing changes to DescTools:::PseudoR2, however, I first need to check if the solution is general.

这个想法的关键是跳过update,而是直接调用glm.所有需要的信息都在 glm 对象中,甚至在 purrr::map 中.使用 psr2glm 的不错的副作用:unnest 的输出看起来不错.

The key to this idea is to skip update and instead directly call glm. All required information pieces are within the glm object, even within purrr::map. Nice side effect of using psr2glm: unnest's output looks nice.

(D) 更改 glmupdate.鉴于 glm 对象实际上包含所有必要的信息,人们可以将观察到的行为视为错误.所以它应该在基础 R 中修复.

(D) Change either glm or update. Given that the glm object actually contains all necessary information, one could consider the observed behavior a bug. So it should be fixed in base R.

推荐答案

一种方法是为 glm() 定义一个包装器,通过手动构造表达式然后评估将数据直接放入调用中它:

One way is to define a wrapper for glm() that puts data directly inside the call by manually constructing the expression and then evaluating it:

glm2 <- function(.df, ...) {
  eval(rlang::expr(glm(!!rlang::enexpr(.df),!!!list(...)))) }

glms <- Default_nested %>%
    mutate( model = map(data,glm2,formula=formul,family="binomial"),
            pr2   = map(model,pscl::pR2) )
# # A tibble: 2 x 4
#   student data                 model  pr2      
#   <fct>   <list>               <list> <list>   
# 1 No      <tibble [7,056 × 3]> <glm>  <dbl [6]>
# 2 Yes     <tibble [2,944 × 3]> <glm>  <dbl [6]>

验证:

## Perform the computation by hand and ensure that it's identical to glms$pr2
glm(Default_nested$data[[1]], formula=default~income+balance, family="binomial") %>%
  pscl::pR2() %>% identical( glms$pr2[[1]] )     # TRUE
glm(Default_nested$data[[2]], formula=default~income+balance, family="binomial") %>%
  pscl::pR2() %>% identical( glms$pr2[[2]] )     # TRUE

这篇关于purrr:map 和 glm - 通话问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆