R:ggplot 堆积条形图,y 轴上有计数,但百分比作为标签 [英] R: ggplot stacked bar chart with counts on y axis but percentage as label

查看:41
本文介绍了R:ggplot 堆积条形图,y 轴上有计数,但百分比作为标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种方法来用百分比标记堆积条形图,而 y 轴显示原始计数(使用 ggplot).这是没有标签的情节的 MWE:

库(ggplot2)df <- as.data.frame(matrix(nrow = 7, ncol= 3,数据 = c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7","北", "北", "北", "北", "南", "南", "南","A", "B", "B", "C", "A", "A", "C"),byrow = FALSE))colnames(df) <- c("ID", "region", "species")p <- ggplot(df, aes(x = 区域,填充 = 物种))p + geom_bar()

我有一个更大的表格,R 可以很好地计算每个地区的不同物种.现在,我想同时显示原始计数值(最好在 y 轴上)和百分比(作为标签)以比较区域之间的物种比例.

我使用 geom_text() 尝试了很多东西,但我认为与其他问题的主要区别 (

更新:使用 dplyr 0.5 及更高版本,您不再需要提供 y 值来将每个条中的文本居中.相反,您可以使用 position_stack(vjust=0.5):

ggplot(df %>% count(region, species) %>% #按地区和物种分组,然后计算每组的数量mutate(pct=n/sum(n)), # 计算每个区域内的百分比aes(region, n, fill=species)) +geom_bar(stat="身份") +geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%")),位置=位置堆栈(vjust=0.5))

I'm looking for a way to label a stacked bar chart with percentages while the y-axis shows the original count (using ggplot). Here is a MWE for the plot without labels:

library(ggplot2)
df <- as.data.frame(matrix(nrow = 7, ncol= 3,
                       data = c("ID1", "ID2", "ID3", "ID4", "ID5", "ID6", "ID7",
                                "north", "north", "north", "north", "south", "south", "south",
                                "A", "B", "B", "C", "A", "A", "C"),
                      byrow = FALSE))

colnames(df) <- c("ID", "region", "species")

p <- ggplot(df, aes(x = region, fill = species))
p  + geom_bar()

I have a much larger table and R counts quite nicely the different species for every region. Now, I would like to show both, the original count value (preferably on the y-axis) and the percentage (as label) to compare proportions of species between regions.

I tried out many things using geom_text() but I think the main difference to other questions (e.g. this one) is that

  • I do not have a separate column for y values (they are just the counts of different species per region) and
  • I need the labels per region to sum up to 100% (since they are considered to represent seperate populations), not all labels of the entire plot.

Any help is much appreciated!!

解决方案

As @Gregor mentioned, summarize the data separately and then feed the data summary to ggplot. In the code below, we use dplyr to create the summary on the fly:

library(dplyr)

ggplot(df %>% count(region, species) %>%    # Group by region and species, then count number in each group
         mutate(pct=n/sum(n),               # Calculate percent within each region
                ypos = cumsum(n) - 0.5*n),  # Calculate label positions
       aes(region, n, fill=species)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%"), y=ypos))

Update: With dplyr 0.5 and later, you no longer need to provide a y-value to center the text within each bar. Instead you can use position_stack(vjust=0.5):

ggplot(df %>% count(region, species) %>%    # Group by region and species, then count number in each group
         mutate(pct=n/sum(n)),              # Calculate percent within each region
       aes(region, n, fill=species)) +
  geom_bar(stat="identity") +
  geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%")), 
            position=position_stack(vjust=0.5))

这篇关于R:ggplot 堆积条形图,y 轴上有计数,但百分比作为标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆