R:ggplot堆积条形图,y轴上的计数,但百分比作为标签 [英] R: ggplot stacked bar chart with counts on y axis but percentage as label
问题描述
我正在寻找一种方法来标记带有百分比的堆积条形图,而y轴显示原始计数(使用ggplot)。这是没有标签的情节的MWE:
library(ggplot2)
df < - as.data。帧(矩阵(nrow = 7,ncol = 3,
data = c(ID1,ID2,ID3,ID4,ID5,ID6,ID7,
北北北北南南南
ABBC A,A,C),
byrow = FALSE))
colnames(df)< -c(ID,region )
p < - ggplot(df,aes(x = region,fill = species))
p + geom_bar()
我有一个更大的表格,R对每个地区的不同物种都很好地计数。现在,我想同时显示原始计数值(最好在y轴上)和百分比(如标签),以比较区域间物种的比例。
我使用 geom_text()
尝试了很多东西,但我认为与其他问题的主要区别(
更新:使用 dplyr
0.5及更高版本,您不再需要提供y值酒吧。相反,您可以使用 position_stack(vjust = 0.5)
:
$ $ $ $ $ $ $ $ ggplot df%>%count(region,species)%>%#按地区和物种分组,然后计算每个组的数量
mutate(pct = n / sum(n)),#计算每个区域内的百分比
aes(region,n,fill = species))+
geom_bar(stat =identity)+
geom_text(aes(label = paste0(sprintf(%1.1f,pct * 100),%)),
position = position_stack(vjust = 0.5))
I'm looking for a way to label a stacked bar chart with percentages while the y-axis shows the original count (using ggplot). Here is a MWE for the plot without labels:
library(ggplot2)
df <- as.data.frame(matrix(nrow = 7, ncol= 3,
data = c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7",
"north", "north", "north", "north", "south", "south", "south",
"A", "B", "B", "C", "A", "A", "C"),
byrow = FALSE))
colnames(df) <- c("ID", "region", "species")
p <- ggplot(df, aes(x = region, fill = species))
p + geom_bar()
I have a much larger table and R counts quite nicely the different species for every region. Now, I would like to show both, the original count value (preferably on the y-axis) and the percentage (as label) to compare proportions of species between regions.
I tried out many things using geom_text()
but I think the main difference to other questions (e.g. this one) is that
- I do not have a separate column for y values (they are just the counts of different species per region) and
- I need the labels per region to sum up to 100% (since they are considered to represent seperate populations), not all labels of the entire plot.
Any help is much appreciated!!
As @Gregor mentioned, summarize the data separately and then feed the data summary to ggplot. In the code below, we use dplyr
to create the summary on the fly:
library(dplyr)
ggplot(df %>% count(region, species) %>% # Group by region and species, then count number in each group
mutate(pct=n/sum(n), # Calculate percent within each region
ypos = cumsum(n) - 0.5*n), # Calculate label positions
aes(region, n, fill=species)) +
geom_bar(stat="identity") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%"), y=ypos))
Update: With dplyr
0.5 and later, you no longer need to provide a y-value to center the text within each bar. Instead you can use position_stack(vjust=0.5)
:
ggplot(df %>% count(region, species) %>% # Group by region and species, then count number in each group
mutate(pct=n/sum(n)), # Calculate percent within each region
aes(region, n, fill=species)) +
geom_bar(stat="identity") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%")),
position=position_stack(vjust=0.5))
这篇关于R:ggplot堆积条形图,y轴上的计数,但百分比作为标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!