partykit:当包含不相等的回归者的名称长度时,在终端节点中证明文本的正确性 [英] partykit: justify text in terminal node when unequal regressors' name lengths are included

查看:44
本文介绍了partykit:当包含不相等的回归者的名称长度时,在终端节点中证明文本的正确性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将终端节点的美学编辑为:

  1. 增大框的大小,以便在其中列出全名.

  2. 如果可能,在存在不相等的回归者名称长度的情况下,对内部文本进行对齐,以生成终端节点的表状视图.

在下面,我使用 gp 选项(fontsize = 10,boxwidth = 10)列出了我的尝试,但我怀疑自己使用的是错误的美学选项./p>

mysummary 函数在

但是我想得到类似以下的内容:

非常感谢.

解决方案

一个简单且基本的解决方案是使用诸如Courier或Inconsolata之类的比例宽度字体:

  plot(pid_tree,terminal_panel = node_terminal,tp_args = list(FUN = mysummary,填充=白色"),gp = gpar(fontfamily ="inconsolata")) 

除了这个简单的基于文本的表之外,您还可以生成更复杂的表,例如,通过 ggplot2 gtable 生成,如下图所示:Seibold,霍特霍恩,Zeileis(2019).具有全局加性效应的广义线性模型树".数据分析和分类的进展 13 ,703-725.

涉及到一些代码,但是可以在本文的复制材料中找到.具体来说,您需要以下两个文件:

  • this question.

    
    library("partykit")
    
    set.seed(1234L)
    data("PimaIndiansDiabetes", package = "mlbench")
    ## a simple basic fitting function (of type 1) for a logistic regression
    logit <- function(y, x, start = NULL, weights = NULL, offset = NULL, ...) {
                      glm(y ~ 0 + x, family = binomial, start = start, ...)}
    
    
    ## Long name regressors
    PimaIndiansDiabetes$looooong_name_1 <- rnorm(nrow(PimaIndiansDiabetes))
    PimaIndiansDiabetes$looooong_name_2 <- rnorm(nrow(PimaIndiansDiabetes))
    ## Short name regressor
    PimaIndiansDiabetes$short_name <- rnorm(nrow(PimaIndiansDiabetes))
    
    
    ## set up a logistic regression tree
    pid_tree <- mob(diabetes ~ glucose        + 
                              looooong_name_1 +
                              looooong_name_2 +
                              short_name      | 
                              pregnant + pressure + triceps + insulin +
                              mass + pedigree + age, data = PimaIndiansDiabetes, fit = logit)
    
    ## Summary function from: https://stackoverflow.com/questions/65495322/partykit-modify-terminal-node-to-include-standard-deviation-and-significance-of/65500344#65500344
    mysummary <- function(info, digits = 2) {
      n <- info$nobs
      na <- format(names(coef(info$object)))
      cf <- format(coef(info$object), digits = digits)
      se <- format(sqrt(diag(vcov(info$object))), digits = digits)
      t <- format(coef(info$object)/sqrt(diag(vcov(info$object))) ,digits = digits)
    
      c(paste("n =", n),
        paste("Regressor","beta" ,"[", "t-ratio" ,"]"),
        paste(na, cf, "[",t,"]")
      )
    }
    
    #plot tree
    plot(pid_tree,
         terminal_panel = node_terminal,
         tp_args = list(FUN = mysummary,fill = c("white")),
         gp = gpar(fontsize = 10,
                   boxwidth = 10,           ## aparently this option doesn't belonw here,
                   margins = rep(0.01, 4))) ## neither this does.
    
    
    

    This is what I am getting:

    but I would like to get something like the following:

    Thanks a lot.

    解决方案

    A simple and basic solution is to use a proportional width font like Courier or Inconsolata:

    plot(pid_tree, terminal_panel = node_terminal,
      tp_args = list(FUN = mysummary, fill = "white"),
      gp = gpar(fontfamily = "inconsolata"))
    

    In addition to this simple text-based table, you can also produce more elaborate tables, e.g., via ggplot2 and gtable as in the following plot taken from: Seibold, Hothorn, Zeileis (2019). "Generalised Linear Model Trees with Global Additive Effects." Advances in Data Analysis and Classification, 13, 703-725. doi:10.1007/s11634-018-0342-1

    The code is a little bit involved but available in the replication materials of the article. Specifically, you need these two files:

    这篇关于partykit:当包含不相等的回归者的名称长度时,在终端节点中证明文本的正确性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆