从头开始创建geom/stat [英] Creating geom / stat from scratch

查看:121
本文介绍了从头开始创建geom/stat的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

不久前我才刚开始使用R,目前正在尝试增强我的可视化技能.我想做的是用平均钻石作为顶层创建框线图(请参阅下面链接中的图片).我还没有找到执行此操作的函数,所以我想我必须自己创建它.

I just started working with R not long ago, and I am currently trying to strengthen my visualization skills. What I want to do is to create boxplots with mean diamonds as a layer on top (see picture in the link below). I did not find any functions that does this already, so I guess I have to create it myself.

我希望做的是创建一个几何图形或统计数据,以使类似这样的事情起作用:

What I was hoping to do was to create a geom or a stat that would allow something like this to work:

ggplot(data, aes(...))) + 
   geom_boxplot(...) +
   geom_meanDiamonds(...)

我不知道从哪里开始构建这个新功能.我知道平均菱形(均值和置信区间)需要哪些值,但是我不知道如何构建从ggplot()中获取数据,计算每组的均值和CI并绘制一个a的geom/stat每个箱形图上的平均钻石数.

I have no idea where to start in order to build this new function. I know which values are needed for the mean diamonds (mean and confidence interval), but I do not know how to build the geom / stat that takes the data from ggplot(), calculates the mean and CI for each group, and plots a mean diamond on top of each boxplot.

我已经搜索了有关如何从头开始构建这些类型的函数的详细说明,但是,我没有发现任何真正的从底层开始的内容.如果有人可以向我指出一些有用的指南,我将不胜感激.

I have searched for detailed descriptions on how to build these type of functions from scratch, however, I have not found anything that really starts from the bottom. I would really appreciate it, if anyone could point me towards some useful guides.

谢谢!

推荐答案

我目前正在学习自己编写geoms,所以这将是相当长的&在我进行思考的过程中,发散了文章,从几何的Stats方面(计算这些多边形和线段的位置)中解开了Geom方面(创建多边形和线段).

I'm currently learning to write geoms myself, so this is going to be a rather long & rambling post as I go through my thought processes, untangling the Geom aspects (creating polygons & line segments) from the Stats aspects (calculating where these polygons & segments should be) of a geom.

免责声明:我对这种情节并不熟悉,并且Google并没有提出很多权威性指南.我对此处如何计算/使用置信区间的理解可能不正确.

步骤0.了解geom/stat与图层功能之间的关系.

geom_boxplotstat_boxplot是图层功能的示例.如果将它们输入R控制台,则会看到它们相对较短,并且不包含用于计算箱线图箱形/晶须的实际代码.相反,geom_boxplot包含一行表示geom = GeomBoxplot的行,而stat_boxplot包含一行表示stat = StatBoxplot的行(如下所示).

geom_boxplot and stat_boxplot are examples of layer functions. If you enter them into the R console, you'll see that they are (relatively) short, and does not contain actual code for calculating the box / whiskers of the boxplot. Instead, geom_boxplot contains a line that says geom = GeomBoxplot, while stat_boxplot contains a line that says stat = StatBoxplot (reproduced below).

> stat_boxplot
function (mapping = NULL, data = NULL, geom = "boxplot", position = "dodge2", 
    ..., coef = 1.5, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) 
{
    layer(data = data, mapping = mapping, stat = StatBoxplot, 
        geom = geom, position = position, show.legend = show.legend, 
        inherit.aes = inherit.aes, params = list(na.rm = na.rm, 
            coef = coef, ...))
}

GeomBoxplotStatBoxplot是ggproto对象.它们就是魔术发生的地方.

GeomBoxplot and StatBoxplot are ggproto objects. They are where the magic happens.

步骤1.确认ggproto()_inherit参数是您的朋友.

Step 1. Recognise that ggproto()'s _inherit parameter is your friend.

不要重新发明轮子.由于我们要创建与箱形图很好地重叠的内容,因此可以从 Geom /统计用于此操作,并且仅更改必要的内容.

Don't reinvent the wheel. Since we want to create something that overlaps nicely with a boxplot, we can take reference from the Geom / Stat used for that, and only change what's necessary.

StatMeanDiamonds <- ggproto(
  `_class` = "StatMeanDiamonds",
  `_inherit` = StatBoxplot,
  ... # add functions here to override those defined in StatBoxplot
)

GeomMeanDiamonds <- ggproto(
  `_class` = "GeomMeanDiamonds",
  `_inherit` = GeomBoxplot,
  ... # as above
)

第2步.修改统计信息.

StatBoxplot中定义了3个函数:setup_datasetup_paramscompute_group.您可以参考Github(上面的链接)上的代码以获取详细信息,或通过输入例如StatBoxplot$compute_group来查看它们.

There are 3 functions defined within StatBoxplot: setup_data, setup_params, and compute_group. You can refer to the code on Github (link above) for the details, or view them by entering for example StatBoxplot$compute_group.

compute_group函数为与每个组关联的所有y值(即每个唯一的x值)计算ymin/下/中/上/ymax值,用于绘制箱形图.我们可以用一个计算置信区间&的方法覆盖它.取平均值:

The compute_group function calculates the ymin / lower / middle / upper / ymax values for all the y values associated with each group (i.e. each unique x value), which are used to plot the box plot. We can override it with one that calculates the confidence interval & mean values instead:

# ci is added as a parameter, to allow the user to specify different confidence intervals
compute_group_new <- function(data, scales, width = NULL, 
                              ci = 0.95, na.rm = FALSE){
  a <- mean(data$y)
  s <- sd(data$y)
  n <- sum(!is.na(data$y))
  error <- qt(ci + (1-ci)/2, df = n-1) * s / sqrt(n)
  stats <- c("lower" = a - error, "mean" = a, "upper" = a + error)

  if(length(unique(data$x)) > 1) width <- diff(range(data$x)) * 0.9

  df <- as.data.frame(as.list(stats))

  df$x <- if(is.factor(data$x)) data$x[1] else mean(range(data$x))
  df$width <- width

  df
}

(可选)StatBoxplot规定用户可以包含weight作为美观映射.我们也可以通过替换:

(Optional) StatBoxplot has provision for the user to include weight as an aesthetic mapping. We can allow for that as well, by replacing:

  a <- mean(data$y)
  s <- sd(data$y)
  n <- sum(!is.na(data$y))

具有:

  if(!is.null(data$weight)) {
    a <- Hmisc::wtd.mean(data$y, weights = data$weight)
    s <- sqrt(Hmisc::wtd.var(data$y, weights = data$weight))
    n <- sum(data$weight[!is.na(data$y) & !is.na(data$weight)])
  } else {
    a <- mean(data$y)
    s <- sd(data$y)
    n <- sum(!is.na(data$y))
  }

无需更改StatBoxplot中的其他功能.因此,我们可以按以下方式定义StatMeanDiamonds:

There's no need to change the other functions in StatBoxplot. So we can define StatMeanDiamonds as follows:

StatMeanDiamonds <- ggproto(
  `_class` = "StatMeanDiamonds",
  `_inherit` = StatBoxplot,
  compute_group = compute_group_new
)

第3步.修改几何.

GeomBoxplot具有3个功能:setup_datadraw_groupdraw_key.它还包括default_aes()required_aes()的定义.

GeomBoxplot has 3 functions: setup_data, draw_group, and draw_key. It also includes definitions for default_aes() and required_aes().

由于我们已经更改了上游数据源(StatMeanDiamonds生成的数据包含计算出的列"lower"/"mean"/"upper",而StatBoxplot生成的数据将包含计算出的列"ymin"/下"/中"/上"/"ymax"),请检查下游setup_data功能是否也受到影响. (在这种情况下,GeomBoxplot$setup_data不引用受影响的列,因此此处无需进行任何更改.)

Since we've changed the upstream data source (the data produced by StatMeanDiamonds contain the calculated columns "lower" / "mean" / "upper", while the data produced by StatBoxplot would have contained the calculated columns "ymin" / "lower" / "middle" / "upper" / "ymax"), do check whether the downstream setup_data function is affected as well. (In this case, GeomBoxplot$setup_data makes no reference to the affected columns, so no changes required here.)

draw_group函数接收由StatMeanDiamonds生成并由setup_data设置的数据,并生成多个数据帧. 公共"包含所有几何图形共有的美学映射. "diamond.df"表示对菱形多边形的映射,"segment.df"表示对水平线段的映射.然后将数据帧分别传递到GeomPolygon和GeomSegment的draw_panel函数,以生成实际的多边形/线段.

The draw_group function takes the data produced by StatMeanDiamonds and set up by setup_data, and produces multiple data frames. "common" contains the aesthetic mappings common to all geoms. "diamond.df" for the mappings that contribute towards the diamond polygon, and "segment.df" for the mappings that contribute towards the horizontal line segment at the mean. The data frames are then passed to the draw_panel functions of GeomPolygon and GeomSegment respectively, to produce the actual polygons / line segments.

draw_group_new = function(data, panel_params, coord,
                      varwidth = FALSE){
  common <- data.frame(colour = data$colour, 
                       size = data$size,
                       linetype = data$linetype, 
                       fill = alpha(data$fill, data$alpha),
                       group = data$group, 
                       stringsAsFactors = FALSE)
  diamond.df <- data.frame(x = c(data$x, data$xmax, data$x, data$xmin),
                           y = c(data$upper, data$mean, data$lower, data$mean),
                           alpha = data$alpha,
                           common,
                           stringsAsFactors = FALSE)
  segment.df <- data.frame(x = data$xmin, xend = data$xmax,
                           y = data$mean, yend = data$mean,
                           alpha = NA,
                           common,
                           stringsAsFactors = FALSE)
  ggplot2:::ggname("geom_meanDiamonds",
                   grid::grobTree(
                     GeomPolygon$draw_panel(diamond.df, panel_params, coord),
                     GeomSegment$draw_panel(segment.df, panel_params, coord)
                   ))
}

draw_key函数用于在需要时为该层创建图例.由于GeomMeanDiamonds继承自GeomBoxplot,因此默认值为draw_key = draw_key_boxplot,我们没有必须对其进行更改.保持不变不会破坏代码.但是,我认为像draw_key_polygon这样的简单图例看起来会比较整洁.

The draw_key function is used to create the legend for this layer, should the need arise. Since GeomMeanDiamonds inherits from GeomBoxplot, the default is draw_key = draw_key_boxplot, and we don't have to change it. Leaving it unchanged will not break the code. However, I think a simpler legend such as draw_key_polygon offers a less cluttered look.

GeomBoxplot的default_aes规格看起来不错.但是我们需要更改required_aes,因为我们期望从StatMeanDiamonds获得的数据是不同的(下"/中"/上",而不是"ymin"/下"/中"/上"/"ymax").

GeomBoxplot's default_aes specifications look fine. But we need to change the required_aes since the data we expect to get from StatMeanDiamonds is different ("lower" / "mean" / "upper" instead of "ymin" / "lower" / "middle" / "upper" / "ymax").

我们现在可以定义GeomMeanDiamonds了:

We are now ready to define GeomMeanDiamonds:

GeomMeanDiamonds <- ggproto(
  "GeomMeanDiamonds",
  GeomBoxplot,
  draw_group = draw_group_new,
  draw_key = draw_key_polygon,
  required_aes = c("x", "lower", "upper", "mean")
)

第4步.定义图层功能.

这是无聊的部分.我直接从geom_boxplot/stat_boxplot复制,删除了对geom_meanDiamonds中离群值的所有引用,更改为geom = GeomMeanDiamonds/stat = StatMeanDiamonds,并将ci = 0.95添加到stat_meanDiamonds.

This is the boring part. I copied from geom_boxplot / stat_boxplot directly, removing all references to outliers in geom_meanDiamonds, changing to geom = GeomMeanDiamonds / stat = StatMeanDiamonds, and adding ci = 0.95 to stat_meanDiamonds.

geom_meanDiamonds <- function(mapping = NULL, data = NULL,
                              stat = "meanDiamonds", position = "dodge2",
                              ..., varwidth = FALSE, na.rm = FALSE, show.legend = NA,
                              inherit.aes = TRUE){
  if (is.character(position)) {
    if (varwidth == TRUE) position <- position_dodge2(preserve = "single")
  } else {
    if (identical(position$preserve, "total") & varwidth == TRUE) {
      warning("Can't preserve total widths when varwidth = TRUE.", call. = FALSE)
      position$preserve <- "single"
    }
  }
  layer(data = data, mapping = mapping, stat = stat,
        geom = GeomMeanDiamonds, position = position,
        show.legend = show.legend, inherit.aes = inherit.aes,
        params = list(varwidth = varwidth, na.rm = na.rm, ...))
}

stat_meanDiamonds <- function(mapping = NULL, data = NULL,
                         geom = "meanDiamonds", position = "dodge2",
                         ..., ci = 0.95,
                         na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) {
  layer(data = data, mapping = mapping, stat = StatMeanDiamonds,
        geom = geom, position = position, show.legend = show.legend,
        inherit.aes = inherit.aes,
        params = list(na.rm = na.rm, ci = ci, ...))
}

第5步.检查输出.

# basic
ggplot(iris, 
       aes(Species, Sepal.Length)) +
  geom_boxplot() +
  geom_meanDiamonds()

# with additional parameters, to see if they break anything
ggplot(iris, 
       aes(Species, Sepal.Length)) +
  geom_boxplot(width = 0.8) +
  geom_meanDiamonds(aes(fill = Species),
                    color = "red", alpha = 0.5, size = 1, 
                    ci = 0.99, width = 0.3)

这篇关于从头开始创建geom/stat的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆