创建自定义几何来计算汇总统计并在绘图区域之外显示* [英] Create custom geom to compute summary statistics and display them *outside* the plotting region

查看:413
本文介绍了创建自定义几何来计算汇总统计并在绘图区域之外显示*的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R包的创建者,


I am the creator of the R package EnvStats.

There is a function I use quite often called stripChart. I am just starting to learn ggplot2, and have spent the past several days poring over Hadley's book, Winston’s book, StackOverflow, and other resources in an attempt to create a geom that approximates what stripChart does. I am unable to figure out how to, within the geom, compute summary statistics and test results and then place them below the x-axis tick marks and also at the top of the plot (outside the plotting region). Here is a simple example using the built-in dataset mtcars:

library(EnvStats)
stripChart(mpg ~ cyl, data = mtcars, col = 1:3, 
  xlab = "Number of Cylinders", ylab = "Miles per Gallon", p.value = TRUE)

Here is an early draft of a geom to try to reproduce most of the functionality of stripChart:

geom_stripchart <- 
function(..., x.nudge = 0.3, 
  jitter.params = list(width = 0.3, height = 0), 
  mean.params = list(size = 2, position = position_nudge(x = x.nudge)), 
  errorbar.params = list(size = 1, width = 0.1, 
  position = position_nudge(x = x.nudge)), 
  n.text = TRUE, mean.sd.text = TRUE, p.value = FALSE) {
    params <- list(...)
    jitter.params   <- modifyList(params, jitter.params)
    mean.params     <- modifyList(params, mean.params)
    errorbar.params <- modifyList(params, errorbar.params)

    jitter <- do.call("geom_jitter", jitter.params)
    mean   <- do.call("stat_summary", modifyList(
      list(fun.y = "mean", geom = "point"), 
      mean.params)
    )
    errorbar <- do.call("stat_summary", modifyList(
      list(fun.data = "mean_cl_normal", geom = "errorbar"), 
      errorbar.params)
    )

    stripchart.list <- list(
      jitter, 
      theme(legend.position = "none"),
      mean, 
      errorbar
    )

    if(n.text || mean.sd.text) {
# Compute summary statistics (sample size, mean, SD) here?
      if(n.text) {
# Add information to stripchart.list to 
# compute sample size per group and add text below x-axis
      }
      if(mean.sd.text) {
# Add information to stripchart.list to 
# compute mean and SD and add text above top of plotting region
      }
    }
    if(p.value) {
# Add information to stripchart.list to 
# compute p-value (and 95% CI for difference if only 2 groups) 
# and add text above top of plotting region
    }
    stripchart.list
}


library(ggplot2)
dev.new()
p <- ggplot(mtcars, aes(x = factor(cyl), y = mpg, color = factor(cyl)))
p + geom_stripchart() + 
    xlab("Number of Cylinders") + 
    ylab("Miles per Gallon")

You can see that the plots are pretty much the same. The problem I’m having is figuring out how to add the sample size below each group, and to add the means and standard deviations at the top, along with the result of the ANOVA test (ignoring the issue of unequal variances at this point). I know it is straightforward to compute summary statistics and then plot them as points or text within the plotting area, but I don’t want to do that.

I have already found examples showing how to place text outside the plot (e.g., using annotation_custom()):
How can I add annotations below the x axis in ggplot2?

Displaying text below the plot generated by ggplot2

The problem is that the examples show how to do this where the user has pre-defined what the annotation is. My problem is that within geom_stripchart, I have to compute summary statistics and test results based on the data that was defined in the call to ggplot(), and then pass those results to annotation_custom(). I don’t know how to get at the x and y variables that are defined in the call to ggplot().

解决方案

I posted a simpler version of this question here: ggplot2: Adding sample size information to x-axis tick labels

I have updated the EnvStats package to include a geom called geom_stripchart which is an adaptation of the EnvStats function stripChart. See the help file for geom_stripchart for more information and a list of examples. Below is a simple example:

library(ggplot2)
library(EnvStats)

p <- ggplot(mtcars, aes(x = factor(cyl), y = mpg, color = factor(cyl))) 

p + geom_stripchart(test.text = TRUE) + 
  labs(x = "Number of Cylinders", y = "Miles per Gallon")

这篇关于创建自定义几何来计算汇总统计并在绘图区域之外显示*的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆