在ggplot中的构面的最小值或最大值处将样本大小添加到箱形图中 [英] Adding sample size to a box plot at the min or max of the facet in ggplot
问题描述
有很多解释,包括
您可以使用 ggplot_build
检查ggplot对象的结构,尤其是x和y面板范围存储在布局中.将图分配给一个对象并查看结构:
gg<-ggplot(mtcars,aes(factor(cyl),mpg,label = rownames(mtcars)))+geom_boxplot(fill ="grey80",color =#3366FF")+stat_summary(fun.data = Give.n,geom ="text",fun.y =中位数)+stat_summary(fun.data = mean.n,geom ="text",fun.y = mean,color ="red")+facet_grid(cyl〜.,scale ="free_y")ggplot_build(gg)
您将特别感兴趣:
ggplot_build(gg)$ layout $ panel_ranges
3个面板的ylim分别为c(ymin,ymax),并存储在以下位置:
ggplot_build(gg)$ layout $ panel_ranges [[1]] $ y.rangeggplot_build(gg)$ layout $ panel_ranges [[2]] $ y.rangeggplot_build(gg)$ layout $ panel_ranges [[3]] $ y.range
经过编辑以回应评论以及如何将此布局信息合并到绘图中.在这里,我们使用dplyr分别计算按 cyl
分组的统计摘要,并创建单独的数据帧以合并到ggplot2中,而不是使用 stat_summary
.
库(dplyr)gg.summary<-group_by(mtcars,cyl)%>%summarise(平均值=平均值(mpg),中位数=中位数(mpg),长度=长度(mpg))
解析ylim范围并将其包括在统计摘要df中,统计摘要df按cyl分组,这是我们要介绍的变量:
gg.summary $ panel.ylim<-sapply(order(levels(factor(mtcars $ cyl))),function(x)ggplot_build(gg)$ layout $ panel_ranges [[x]] $ y.range [1])##小动作:3 x 5#cyl平均中位长度panel.ylim#< dbl>< dbl>< dbl>< int>< dbl>#1 4 26.66364 26.0 11 20.775#2 6 19.74286 19.7 7 17.620#3 8 15.10000 15.2 14 9.960
在ggplot中使用,我相信这是您想要的图:
gg + geom_text(data = gg.summary,(aes(x = factor(cyl),y = panel.ylim,label = paste("n =",length)))))+geom_text(data = gg.summary,(aes(x = factor(cyl),y = median * 0.97,label = format(median,nsmall = 2))))
There are plenty of explanations, including this good one, of how to label box plots with sample size. All of them seem to use max(x)
or median(x)
to position the sample size.
I'm wondering if there is a way to easily position the labels at the top or bottom of the plot, especially when using the scale = "free_y"
command in facet where the max and minimum value for the axis is picked automatically for each facet by ggplot.
The reason is that I am creating multiple facets where the distributions are narrow and the facets are small. It would be easier to read the sample size if it were positioned at the top or bottom of the plot...but I'd like to use "free_y" because there are meaningful differences in some facets that are obscured by the facets that have much larger spans in the data.
Using a slightly modified example from the linked post:
# function for number of observations
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}
# function for mean labels
mean.n <- function(x){
return(c(y = median(x)*0.97, label = round(mean(x),2)))
# experiment with the multiplier to find the perfect position
}
# plot
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
stat_summary(fun.data = give.n, geom = "text", fun.y = median) +
stat_summary(fun.data = mean.n, geom = "text", fun.y = mean, colour = "red") +
facet_grid(cyl~., scale="free_y")
Given this setup, how could I find the min or max of the x axis for each facet and position the sample size there instead of at the median, min or max of each box-and-whisker?
EDIT
I'm updating the question with information from R.S.'s answer below. It's still not answered yet, but their suggestion provides a solution for where to find this information.
ggplot_build(gg)$layout$panel_ranges[[order(levels(factor(mtcars$cyl)))[1]]]$y.range[1]
gives the minimum of the y range for the first factor of mtcars$cyl. So, by my logic, we need to build the plot, without the stat_summary
statements, then find the sample size and minimum y-range using the give.n
function. After that, we can add the stat_summary
statement to the plot...like below:
# plot
gg = ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
facet_grid(cyl~., scale="free_y")
# function for number of observations
give.n <- function(x){
return(c(y = ggplot_build(gg)$layout$panel_ranges[[order(levels(factor(mtcars$cyl)))[x]]]$y.range[1], label = length(x)))
# experiment with the multiplier to find the perfect position
}
gg +
stat_summary(fun.data = give.n, geom = "text", fun.y = "median")
But...the above code doesn't work because I don't really understand what the give.n
function is iterating over. Replacing [[x]]
with any of 1:3 plots all the sample sizes at the minimum for that facet, so that is progress.
Here is the plot using [[2]]
, so all sample sizes are plotted at 17.62, the minimum value of the range for the second facet.
You can examine the structure of the ggplot object using ggplot_build
, in particular the x and y panel ranges are stored in layout. Assign your plot to an object and look at the structure:
gg <- ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
stat_summary(fun.data = give.n, geom = "text", fun.y = median) +
stat_summary(fun.data = mean.n, geom = "text", fun.y = mean, colour = "red") +
facet_grid(cyl~., scale="free_y")
ggplot_build(gg)
In particular you will be interested in:
ggplot_build(gg)$layout$panel_ranges
The ylim of the 3 panels are given as c(ymin, ymax) and stored under:
ggplot_build(gg)$layout$panel_ranges[[1]]$y.range
ggplot_build(gg)$layout$panel_ranges[[2]]$y.range
ggplot_build(gg)$layout$panel_ranges[[3]]$y.range
Edited to respond to comment and how to incorporate this layout info into the plot. Here we calculate the stat summaries grouped by cyl
separately using dplyr, and create separate data frame to incorporate into ggplot2, instead of using stat_summary
.
library(dplyr)
gg.summary <- group_by(mtcars, cyl) %>% summarise(mean=mean(mpg), median=median(mpg), length=length(mpg))
Parse the the ylim ranges and include into the stat summary df, the stat summary df is grouped by cyl which is the variable we are faceting:
gg.summary$panel.ylim <- sapply(order(levels(factor(mtcars$cyl))), function(x) ggplot_build(gg)$layout$panel_ranges[[x]]$y.range[1])
# # A tibble: 3 x 5
# cyl mean median length panel.ylim
# <dbl> <dbl> <dbl> <int> <dbl>
# 1 4 26.66364 26.0 11 20.775
# 2 6 19.74286 19.7 7 17.620
# 3 8 15.10000 15.2 14 9.960
Use in ggplot, I believe this is the plot you want:
gg + geom_text(data=gg.summary, (aes(x=factor(cyl), y=panel.ylim, label=paste("n =",length)))) +
geom_text(data=gg.summary, (aes(x=factor(cyl), y=median*0.97, label=format(median, nsmall=2))))
这篇关于在ggplot中的构面的最小值或最大值处将样本大小添加到箱形图中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!