ggplot:通过组自动化百分比线 [英] ggplot: percentile lines by group automation

查看:212
本文介绍了ggplot:通过组自动化百分比线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现 dplyr %>%操作符对于简单的ggplot2转换很有帮助(不需诉诸 ggproto ,这是



这样做的一种切割方式是 $ b

 事实=数据%>%
group_by(color)%>%
总结(
p50 =分位数(克拉,0.5,na.rm = T),
p90 =分位数(克拉,0.9,na.rm = T)


ymax = ggplot_build(g)$ panel $ ranges [[1]] $ y.range [2]

g +
geom_vline(data = facts,aes(xintercept = p50,color = color),linetype =dashed,size = 1)+
geom_vline(data = facts,aes(xintercept = p90,color = color),linetype =dashed,size = 1)+
geom_text(data = facts,aes(x = p50,label = paste(p50 =,p50),y = ymax ,color = color),vjust = 1.5,hjust = 1,size = 4,angle = 90)+
geom_text(x = p90,label = paste(p90 =,p90) ,y = ymax,color = c 1),大小= 4,角度= 90)



I' d爱将它封装成类似于 g%>%gg.percentile.x(c(.5,.9))的东西,但我一直无法找到按顺序将 aes _ aes_string 的使用与图表对象中的分组列的发现相结合的一种好方法正确计算百分位数。我希望得到一些帮助。

解决方案

我认为最有效的方法来创建所需的剧情由三个步骤组成:


  1. 编写两个独立的简单统计信息(创建新统计信息部分



    备注


    1. 这个问题在这里解决的方法允许构建更复杂的百分点线和将 x 更改为 y

    2. >(反之亦然), vline hline xintercept yintercept 在适当的位置,可以为来自y轴的数据定义相同的统计数据;
    3. 当然,如果你喜欢使用%>%而不是 ggplot2 + 你可以像定义问题一样在函数中包装定义的统计信息。我个人不建议这样做,因为它违背了 ggplot2 的标准用法。



    I've found the dplyr %>% operator helpful with simple ggplot2 transformations (without resorting to ggproto, which is required for ggplot2 extensions), e.g.

    library(ggplot2)
    library(scales)
    library(dplyr)
    
    gg.histo.pct.by.group <- function(g, ...) {
      g + 
        geom_histogram(aes(y=unlist(lapply(unique(..group..), function(grp) ..count..[..group..==grp] / sum(..count..[..group..==grp])))), ...) +
        scale_y_continuous(labels = percent) + 
        ylab("% of total count by group")
    }
    
    data = diamonds %>% select(carat, color) %>% filter(color %in% c('H', 'D'))
    
    g = ggplot(data, aes(carat, fill=color)) %>% 
      gg.histo.pct.by.group(binwidth=0.5, position="dodge")
    

    It's common to add some percentile lines with labels to these types of graphs, e.g.,

    One cut'n'paste way of doing this is

    facts = data %>% 
      group_by(color) %>% 
      summarize(
        p50=quantile(carat, 0.5, na.rm=T), 
        p90=quantile(carat, 0.9, na.rm=T)
      )
    
    ymax = ggplot_build(g)$panel$ranges[[1]]$y.range[2]
    
    g +
      geom_vline(data=facts, aes(xintercept=p50, color=color), linetype="dashed", size=1) +
      geom_vline(data=facts, aes(xintercept=p90, color=color), linetype="dashed", size=1) +
      geom_text(data=facts, aes(x=p50, label=paste("p50=", p50), y=ymax, color=color), vjust=1.5, hjust=1, size=4, angle=90) +
      geom_text(data=facts, aes(x=p90, label=paste("p90=", p90), y=ymax, color=color), vjust=1.5, hjust=1, size=4, angle=90)
    

    I'd love to encapsulate this into something like g %>% gg.percentile.x(c(.5, .9)) but I haven't been able to find a good way to combine the use of aes_ or aes_string with the discovery of the grouping columns in the graph object in order to calculate the percentiles correctly. I'd appreciate some help with that.

    解决方案

    I think the most efficient way to create the desired plot consists from three steps:

    1. Write two separate simple stats (following section Creating a new stat from https://cran.r-project.org/web/packages/ggplot2/vignettes/extending-ggplot2.html): one for adding vertical lines at percentile locations and another for adding text labels;
    2. Combine just written stats into the desired one with parameters as needed;
    3. Use the results of work.

    So the answer also consists from 3 parts.

    Part 1. The stat for adding vertical lines at percentile locations should compute those values based on the data in x-axis and return the result in appropriate format. Here is the code:

    library(ggplot2)
    
    StatPercentileX <- ggproto("StatPercentileX", Stat,
      compute_group = function(data, scales, probs) {
        percentiles <- quantile(data$x, probs=probs)
        data.frame(xintercept=percentiles)
        },
      required_aes = c("x")
    )
    
    stat_percentile_x <- function(mapping = NULL, data = NULL, geom = "vline",
                                  position = "identity", na.rm = FALSE,
                                  show.legend = NA, inherit.aes = TRUE, ...) {
      layer(
        stat = StatPercentileX, data = data, mapping = mapping, geom = geom, 
        position = position, show.legend = show.legend, inherit.aes = inherit.aes,
        params = list(na.rm = na.rm, ...)
      )
    }
    

    The same goes for the stat for adding text labels (the default location is at the top of the plot):

    StatPercentileXLabels <- ggproto("StatPercentileXLabels", Stat,
      compute_group = function(data, scales, probs) {
        percentiles <- quantile(data$x, probs=probs)
        data.frame(x=percentiles, y=Inf,
                   label=paste0("p", probs*100, ": ",
                                round(percentiles, digits=3)))
        },
      required_aes = c("x")
    )
    
    stat_percentile_xlab <- function(mapping = NULL, data = NULL, geom = "text",
                                         position = "identity", na.rm = FALSE,
                                         show.legend = NA, inherit.aes = TRUE, ...) {
      layer(
        stat = StatPercentileXLabels, data = data, mapping = mapping, geom = geom, 
        position = position, show.legend = show.legend, inherit.aes = inherit.aes,
        params = list(na.rm = na.rm, ...)
      )
    }
    

    Already we have pretty powerful instruments that can be used in any fashion ggplot2 can provide (colouring, grouping, faceting and so on). For example:

    set.seed(1401)
    plot_points <- data.frame(x_val=runif(100), y_val=runif(100),
                              g=sample(1:2, 100, replace=TRUE))
    ggplot(plot_points, aes(x=x_val, y=y_val)) +
      geom_point() +
      stat_percentile_x(probs=c(0.25, 0.5, 0.75), linetype=2) +
      stat_percentile_xlab(probs=c(0.25, 0.5, 0.75), hjust=1, vjust=1.5, angle=90) +
      facet_wrap(~g)
    # ggsave("Example_stat_percentile.png", width=10, height=5, units="in")
    

    Part 2 Although keeping separate layers for lines and text labels seems pretty natural (despite a little computational inefficiency of computing percentiles twice) adding two layers every time is quite verbose. Especially for this ggplot2 has simple way of combining layers: put them in the list which is the result function call. The code is as follows:

    stat_percentile_x_wlabels <- function(probs=c(0.25, 0.5, 0.75)) {
      list(
        stat_percentile_x(probs=probs, linetype=2),
        stat_percentile_xlab(probs=probs, hjust=1, vjust=1.5, angle=90)
      )
    }
    

    With this function previous example can be reproduced via the following command:

    ggplot(plot_points, aes(x=x_val, y=y_val)) +
      geom_point() +
      stat_percentile_x_wlabels() +
      facet_wrap(~g)
    

    Note that stat_percentile_x_wlabels takes probabilities of the desired percentiles which are then passed to quantile function. This is the place to specify them.

    Part 3 Using again the idea of combining layers the plot in your question can be reproduced as follows:

    library(scales)
    library(dplyr)
    
    geom_histo_pct_by_group <- function() {
      list(geom_histogram(aes(y=unlist(lapply(unique(..group..),
                                              function(grp) {
                                                ..count..[..group..==grp] /
                                                  sum(..count..[..group..==grp])
                                                }))),
                          binwidth=0.5, position="dodge"),
             scale_y_continuous(labels = percent),
             ylab("% of total count by group")
           )
    }
    
    data = diamonds %>% select(carat, color) %>% filter(color %in% c('H', 'D'))
    
    ggplot(data, aes(carat, fill=color, colour=color)) +
      geom_histo_pct_by_group() +
      stat_percentile_x_wlabels(probs=c(0.5, 0.9))
    # ggsave("Question_plot.png", width=10, height=6, unit="in")
    

    Remarks

    1. The way this problem is solved here allows constructing more complex plots with percentile lines and labels;

    2. With changing x to y (and vice versa), vline to hline, xintercept to yintercept in appropriate places one can define the same stats for the data from y-axis;

    3. Of course if you like using %>% instead of ggplot2's + you can wrap defined stats in functions just like you did in question post. Personally I wouldn't recommend that because it goes against standard use of ggplot2.

    这篇关于ggplot:通过组自动化百分比线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆