是否有一种巧妙的方法可以用geom_quantile()中的等式和其他统计信息标记ggplot图? [英] Is there a neat approach to label a ggplot plot with the equation and other statistics from geom_quantile()?

查看:111
本文介绍了是否有一种巧妙的方法可以用geom_quantile()中的等式和其他统计信息标记ggplot图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想包括 geom_quantile()拟合行中的相关统计信息,其方式与我为 geom_smooth(method ="lm")的方式类似.拟合线性回归(我以前曾使用 ggpmisc ,这是

对于分位数回归,您可以将 geom_smooth()换成 geom_quantile(),并获得一条漂亮的分位数回归线(在本例中为中位数):

 #分位数回归-无方程式标签m +geom_quantile(分位数= 0.5) 

您如何将摘要统计信息显示在标签上,或者随时随地重新创建它们?(即,除了在调用ggplot之前进行回归,然后将其传递给然后进行注释(例如,类似于在

使用 stat_fit_tidy()可以使用相同的方法.然而,在"ggpmisc"(<= 0.3.7)中,它与"lm"一起工作.但不包含"rq".此错误已在"ggpmisc"(> = 0.3.8)(现已在CRAN中)中修复.

下面的示例仅适用于'ggpmisc'(> = 0.3.8)

剩下的问题是 glance() tidy()返回的 tibble 是否包含要添加到绘图中的信息,至少在默认情况下, tidy.qr()似乎并非如此.但是, tidy.rq()具有参数 se.type ,该参数确定 tibble 中返回的值.修改后的 stat_fit_tidy()接受要传递给 tidy()的命名参数,从而使以下操作成为可能.

  m +geom_quantile(分位数= 0.5)+stat_fit_tidy(方法="rq",method.args = list(公式= y〜x),tidy.args = list(se.type ="nid"),映射= aes(label = sprintf('y〜'='〜%.3g〜+〜%.3g〜x *',带有"* italic(P)〜" =〜%.3f',after_stat(Intercept_estimate),after_stat(x_estimate),after_stat(x_p.value))),parse = TRUE) 

此示例将导致以下绘图.

定义新的统计信息 stat_rq_eq()将使这一过程变得更加简单:

  stat_rq_eqn<-函数(公式= y〜x,tau = 0.5,...){stat_fit_tidy(方法="rq",method.args = list(公式=公式,tau = tau),tidy.args = list(se.type ="nid"),映射= aes(label = sprintf('y〜'='〜%.3g〜+〜%.3g〜x *',带有"* italic(P)〜" =〜%.3f',after_stat(Intercept_estimate),after_stat(x_estimate),after_stat(x_p.value))),parse = TRUE,...)} 

答案变成:

  m +geom_quantile(分位数= 0.5)+stat_rq_eqn(tau = 0.5) 

I'd like to include the relevant statistics from a geom_quantile() fitted line in a similar way to how I would for a geom_smooth(method="lm") fitted linear regression (where I've previously used ggpmisc which is awesome). For example, this code:

# quantile regression example with ggpmisc equation
# basic quantile code from here:
# https://ggplot2.tidyverse.org/reference/geom_quantile.html

library(tidyverse)
library(ggpmisc)
# see ggpmisc vignette for stat_poly_eq() code below:
# https://cran.r-project.org/web/packages/ggpmisc/vignettes/user-guide.html#stat_poly_eq

my_formula <- y ~ x
#my_formula <- y ~ poly(x, 3, raw = TRUE)

# linear ols regression with equation labelled
m <- ggplot(mpg, aes(displ, 1 / hwy)) +
  geom_point()

m + 
  geom_smooth(method = "lm", formula = my_formula) +
  stat_poly_eq(aes(label =  paste(stat(eq.label), "*\" with \"*", 
                                  stat(rr.label), "*\", \"*", 
                                  stat(f.value.label), "*\", and \"*",
                                  stat(p.value.label), "*\".\"",
                                  sep = "")),
               formula = my_formula, parse = TRUE, size = 3)  

generates this:

For a quantile regression, you can swap out geom_smooth() for geom_quantile() and get a lovely quantile regression line plotted (in this case the median):

# quantile regression - no equation labelling
m + 
  geom_quantile(quantiles = 0.5)
  

How would you get the summary statistics out to a label, or recreate them on the go? (i.e. other than doing the regression prior to the call to ggplot and then passing it in to then annotate (e.g. similar to what was done here or here for a linear regression?

解决方案

@mark-neal stat_fit_glance() does work with quantreg::rq(). Using stat_fit_glance()is however more involved. This stat does not "know" what to expect from glance(), so one has to assemble the label manually.

One needs to know what is available for this. One can either run fit the model outside the ggplot and use glance() to find out what columns it returns or one can do this in the ggplot with the help of package 'gginnards'. I will show this alternative, continuing from your code example above.

library(gginnards)

m + 
  geom_quantile(quantiles = 0.5) +
  stat_fit_glance(method = "rq", method.args = list(formula = y ~ x), geom = "debug")

geom_debug() by default just prints its input to the R console, its input is what the statistics returns.

# A tibble: 1 x 11
   npcx  npcy   tau logLik    AIC    BIC df.residual     x      y PANEL group
  <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>       <int> <dbl>  <dbl> <fct> <int>
1    NA    NA   0.5   816. -1628. -1621.         232  1.87 0.0803 1        -1

We can the access each of this columns using after_stat() (earlier incarnations being stat() and enclosing the names .... We need to do the formatting using the encoding notation of sprintf(). If as in this case we assemble a string that needs to be parsed into an expression, parse = TRUE is also needed.

m + 
  geom_quantile(quantiles = 0.5) +
  stat_fit_glance(method = "rq", method.args = list(formula = y ~ x), 
                  mapping = aes(label = sprintf('italic(tau)~"="~%.2f~~AIC~"="~%.3g~~BIC~"="~%.3g',
                                                after_stat(tau), after_stat(AIC), after_stat(BIC))),
                  parse = TRUE)

This example results in the following plot.

With stat_fit_tidy() the same approach should have worked. However, in 'ggpmisc' (<= 0.3.7) it worked with "lm" but not with "rq". This bug is fixed in 'ggpmisc' (>= 0.3.8), now in CRAN.

The example below works only with 'ggpmisc' (>= 0.3.8)

The remaining questions is whether the tibble that glance() or tidy() return contains the information one wants to add to the plot, which does not seem to be the case for tidy.qr(), at least by default. However, tidy.rq() has a parameter se.type that determines the values returned in the tibble. The revised stat_fit_tidy() accepts named arguments to be passed to tidy(), making the following possible.

m + 
  geom_quantile(quantiles = 0.5) +
  stat_fit_tidy(method = "rq",
                method.args = list(formula = y ~ x), 
                tidy.args = list(se.type = "nid"),
                mapping = aes(label = sprintf('y~"="~%.3g~+~%.3g~x*", with "*italic(P)~"="~%.3f',
                                              after_stat(Intercept_estimate), 
                                              after_stat(x_estimate),
                                              after_stat(x_p.value))),
                parse = TRUE)

This example results in the following plot.

Defining a new stat stat_rq_eq() would make this even simpler:

stat_rq_eqn <- function(formula = y ~ x, tau = 0.5, ...) {
  stat_fit_tidy(method = "rq",
                method.args = list(formula = formula, tau = tau), 
                tidy.args = list(se.type = "nid"),
                mapping = aes(label = sprintf('y~"="~%.3g~+~%.3g~x*", with "*italic(P)~"="~%.3f',
                                              after_stat(Intercept_estimate), 
                                              after_stat(x_estimate),
                                              after_stat(x_p.value))),
                parse = TRUE,
                ...)
}

With the answer becoming:

m + 
  geom_quantile(quantiles = 0.5) +
  stat_rq_eqn(tau = 0.5)

这篇关于是否有一种巧妙的方法可以用geom_quantile()中的等式和其他统计信息标记ggplot图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆