ggplot:通过组自动化百分比线 [英] ggplot: percentile lines by group automation
问题描述
我发现 dplyr
%>%
操作符对于简单的ggplot2转换很有帮助(不需诉诸 ggproto
,这是
这样做的一种切割方式是 $ b
事实=数据%>%
group_by(color)%>%
总结(
p50 =分位数(克拉,0.5,na.rm = T),
p90 =分位数(克拉,0.9,na.rm = T)
)
ymax = ggplot_build(g)$ panel $ ranges [[1]] $ y.range [2]
g +
geom_vline(data = facts,aes(xintercept = p50,color = color),linetype =dashed,size = 1)+
geom_vline(data = facts,aes(xintercept = p90,color = color),linetype =dashed,size = 1)+
geom_text(data = facts,aes(x = p50,label = paste(p50 =,p50),y = ymax ,color = color),vjust = 1.5,hjust = 1,size = 4,angle = 90)+
geom_text(x = p90,label = paste(p90 =,p90) ,y = ymax,color = c 1),大小= 4,角度= 90)
I' d爱将它封装成类似于 g%>%gg.percentile.x(c(.5,.9))
的东西,但我一直无法找到按顺序将 aes _
或 aes_string
的使用与图表对象中的分组列的发现相结合的一种好方法正确计算百分位数。我希望得到一些帮助。
我认为最有效的方法来创建所需的剧情由三个步骤组成:
- 编写两个独立的简单统计信息(创建新统计信息部分
备注
-
这个问题在这里解决的方法允许构建更复杂的百分点线和将
x 更改为 y
vline
至hline
,xintercept
到yintercept
在适当的位置,可以为来自y轴的数据定义相同的统计数据; -
当然,如果你喜欢使用
%>%
而不是ggplot2
的+
你可以像定义问题一样在函数中包装定义的统计信息。我个人不建议这样做,因为它违背了ggplot2
的标准用法。
I've found the
dplyr
%>%
operator helpful with simple ggplot2 transformations (without resorting toggproto
, which is required for ggplot2 extensions), e.g.library(ggplot2) library(scales) library(dplyr) gg.histo.pct.by.group <- function(g, ...) { g + geom_histogram(aes(y=unlist(lapply(unique(..group..), function(grp) ..count..[..group..==grp] / sum(..count..[..group..==grp])))), ...) + scale_y_continuous(labels = percent) + ylab("% of total count by group") } data = diamonds %>% select(carat, color) %>% filter(color %in% c('H', 'D')) g = ggplot(data, aes(carat, fill=color)) %>% gg.histo.pct.by.group(binwidth=0.5, position="dodge")
It's common to add some percentile lines with labels to these types of graphs, e.g.,
One cut'n'paste way of doing this is
facts = data %>% group_by(color) %>% summarize( p50=quantile(carat, 0.5, na.rm=T), p90=quantile(carat, 0.9, na.rm=T) ) ymax = ggplot_build(g)$panel$ranges[[1]]$y.range[2] g + geom_vline(data=facts, aes(xintercept=p50, color=color), linetype="dashed", size=1) + geom_vline(data=facts, aes(xintercept=p90, color=color), linetype="dashed", size=1) + geom_text(data=facts, aes(x=p50, label=paste("p50=", p50), y=ymax, color=color), vjust=1.5, hjust=1, size=4, angle=90) + geom_text(data=facts, aes(x=p90, label=paste("p90=", p90), y=ymax, color=color), vjust=1.5, hjust=1, size=4, angle=90)
I'd love to encapsulate this into something like
g %>% gg.percentile.x(c(.5, .9))
but I haven't been able to find a good way to combine the use ofaes_
oraes_string
with the discovery of the grouping columns in the graph object in order to calculate the percentiles correctly. I'd appreciate some help with that.解决方案I think the most efficient way to create the desired plot consists from three steps:
- Write two separate simple stats (following section Creating a new stat from https://cran.r-project.org/web/packages/ggplot2/vignettes/extending-ggplot2.html): one for adding vertical lines at percentile locations and another for adding text labels;
- Combine just written stats into the desired one with parameters as needed;
- Use the results of work.
So the answer also consists from 3 parts.
Part 1. The stat for adding vertical lines at percentile locations should compute those values based on the data in x-axis and return the result in appropriate format. Here is the code:
library(ggplot2) StatPercentileX <- ggproto("StatPercentileX", Stat, compute_group = function(data, scales, probs) { percentiles <- quantile(data$x, probs=probs) data.frame(xintercept=percentiles) }, required_aes = c("x") ) stat_percentile_x <- function(mapping = NULL, data = NULL, geom = "vline", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...) { layer( stat = StatPercentileX, data = data, mapping = mapping, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(na.rm = na.rm, ...) ) }
The same goes for the stat for adding text labels (the default location is at the top of the plot):
StatPercentileXLabels <- ggproto("StatPercentileXLabels", Stat, compute_group = function(data, scales, probs) { percentiles <- quantile(data$x, probs=probs) data.frame(x=percentiles, y=Inf, label=paste0("p", probs*100, ": ", round(percentiles, digits=3))) }, required_aes = c("x") ) stat_percentile_xlab <- function(mapping = NULL, data = NULL, geom = "text", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...) { layer( stat = StatPercentileXLabels, data = data, mapping = mapping, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(na.rm = na.rm, ...) ) }
Already we have pretty powerful instruments that can be used in any fashion
ggplot2
can provide (colouring, grouping, faceting and so on). For example:set.seed(1401) plot_points <- data.frame(x_val=runif(100), y_val=runif(100), g=sample(1:2, 100, replace=TRUE)) ggplot(plot_points, aes(x=x_val, y=y_val)) + geom_point() + stat_percentile_x(probs=c(0.25, 0.5, 0.75), linetype=2) + stat_percentile_xlab(probs=c(0.25, 0.5, 0.75), hjust=1, vjust=1.5, angle=90) + facet_wrap(~g) # ggsave("Example_stat_percentile.png", width=10, height=5, units="in")
Part 2 Although keeping separate layers for lines and text labels seems pretty natural (despite a little computational inefficiency of computing percentiles twice) adding two layers every time is quite verbose. Especially for this
ggplot2
has simple way of combining layers: put them in the list which is the result function call. The code is as follows:stat_percentile_x_wlabels <- function(probs=c(0.25, 0.5, 0.75)) { list( stat_percentile_x(probs=probs, linetype=2), stat_percentile_xlab(probs=probs, hjust=1, vjust=1.5, angle=90) ) }
With this function previous example can be reproduced via the following command:
ggplot(plot_points, aes(x=x_val, y=y_val)) + geom_point() + stat_percentile_x_wlabels() + facet_wrap(~g)
Note that
stat_percentile_x_wlabels
takes probabilities of the desired percentiles which are then passed toquantile
function. This is the place to specify them.Part 3 Using again the idea of combining layers the plot in your question can be reproduced as follows:
library(scales) library(dplyr) geom_histo_pct_by_group <- function() { list(geom_histogram(aes(y=unlist(lapply(unique(..group..), function(grp) { ..count..[..group..==grp] / sum(..count..[..group..==grp]) }))), binwidth=0.5, position="dodge"), scale_y_continuous(labels = percent), ylab("% of total count by group") ) } data = diamonds %>% select(carat, color) %>% filter(color %in% c('H', 'D')) ggplot(data, aes(carat, fill=color, colour=color)) + geom_histo_pct_by_group() + stat_percentile_x_wlabels(probs=c(0.5, 0.9)) # ggsave("Question_plot.png", width=10, height=6, unit="in")
Remarks
The way this problem is solved here allows constructing more complex plots with percentile lines and labels;
With changing
x
toy
(and vice versa),vline
tohline
,xintercept
toyintercept
in appropriate places one can define the same stats for the data from y-axis;Of course if you like using
%>%
instead ofggplot2
's+
you can wrap defined stats in functions just like you did in question post. Personally I wouldn't recommend that because it goes against standard use ofggplot2
.
这篇关于ggplot:通过组自动化百分比线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-