R:ggplot 堆积条形图,y 轴上有计数,但百分比作为标签 [英] R: ggplot stacked bar chart with counts on y axis but percentage as label
问题描述
我正在寻找一种方法来用百分比标记堆积条形图,而 y 轴显示原始计数(使用 ggplot).这是没有标签的情节的 MWE:
库(ggplot2)df <- as.data.frame(matrix(nrow = 7, ncol= 3,数据 = c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7","北", "北", "北", "北", "南", "南", "南","A", "B", "B", "C", "A", "A", "C"),byrow = FALSE))colnames(df) <- c("ID", "region", "species")p <- ggplot(df, aes(x = 区域,填充 = 物种))p + geom_bar()
我有一个更大的表格,R 可以很好地计算每个地区的不同物种.现在,我想同时显示原始计数值(最好在 y 轴上)和百分比(作为标签)以比较区域之间的物种比例.
我使用 geom_text()
尝试了很多东西,但我认为与其他问题的主要区别 (
更新:使用 dplyr
0.5 及更高版本,您不再需要提供 y 值来将每个条中的文本居中.相反,您可以使用 position_stack(vjust=0.5)
:
ggplot(df %>% count(region, species) %>% #按地区和物种分组,然后计算每组的数量mutate(pct=n/sum(n)), # 计算每个区域内的百分比aes(region, n, fill=species)) +geom_bar(stat="身份") +geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%")),位置=位置堆栈(vjust=0.5))
I'm looking for a way to label a stacked bar chart with percentages while the y-axis shows the original count (using ggplot). Here is a MWE for the plot without labels:
library(ggplot2)
df <- as.data.frame(matrix(nrow = 7, ncol= 3,
data = c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7",
"north", "north", "north", "north", "south", "south", "south",
"A", "B", "B", "C", "A", "A", "C"),
byrow = FALSE))
colnames(df) <- c("ID", "region", "species")
p <- ggplot(df, aes(x = region, fill = species))
p + geom_bar()
I have a much larger table and R counts quite nicely the different species for every region. Now, I would like to show both, the original count value (preferably on the y-axis) and the percentage (as label) to compare proportions of species between regions.
I tried out many things using geom_text()
but I think the main difference to other questions (e.g. this one) is that
- I do not have a separate column for y values (they are just the counts of different species per region) and
- I need the labels per region to sum up to 100% (since they are considered to represent seperate populations), not all labels of the entire plot.
Any help is much appreciated!!
As @Gregor mentioned, summarize the data separately and then feed the data summary to ggplot. In the code below, we use dplyr
to create the summary on the fly:
library(dplyr)
ggplot(df %>% count(region, species) %>% # Group by region and species, then count number in each group
mutate(pct=n/sum(n), # Calculate percent within each region
ypos = cumsum(n) - 0.5*n), # Calculate label positions
aes(region, n, fill=species)) +
geom_bar(stat="identity") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%"), y=ypos))
Update: With dplyr
0.5 and later, you no longer need to provide a y-value to center the text within each bar. Instead you can use position_stack(vjust=0.5)
:
ggplot(df %>% count(region, species) %>% # Group by region and species, then count number in each group
mutate(pct=n/sum(n)), # Calculate percent within each region
aes(region, n, fill=species)) +
geom_bar(stat="identity") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%")),
position=position_stack(vjust=0.5))
这篇关于R:ggplot 堆积条形图,y 轴上有计数,但百分比作为标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!