将百分比添加到 GGplot2 中的分组条形图列 [英] Adding Percentages to a Grouped Barchart Columns in GGplot2

查看:15
本文介绍了将百分比添加到 GGplot2 中的分组条形图列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望有人可以帮助我用百分比标记分组条形图的列.我找不到可以使工作成功的现有帖子.以下是基本示例数据框的代码.

Hoping someone can help me with labelling columns of a grouped barchart with percentages. I couldn't find an existing post that I could make work successfuly. Below is the code for a basic example dataframe.

Service<-c("AS","AS","PS","PS","RS","RS","ES","ES")

Year<-c("2015","2016","2015","2016","2015","2016","2015","2016")

Q1<-c("Dissatisfied","Satisfied","Satisfied","Satisfied","Dissatisfied","Dissatisfied","Satisfied","Satisfied")

Q2<-c("Dissatisfied","Dissatisfied","Satisfied","Dissatisfied","Dissatisfied","Satisfied","Satisfied","Satisfied")

Example<-data.frame(Service,Year,Q1,Q2)

接下来,我使用 Reshape2 将其熔化,以便我可以沿 x 轴绘制 Q1 和 Q2 列变量.然后我用 ggplot2 创建了一个基本的分组条形图,在 y 轴上有计数,然后按年份创建一个方面.

Next, I melted it with Reshape2 so that I could plot the Q1 and Q2 column variables along the x-axis. I then created a basic grouped barchart with ggplot2, with counts on the y-axis, and then a facet by year.

ExampleM<-melt(Example,id.vars=c("Service","Year"))

ggplot(ExampleM,aes(x=variable,stat="identity",fill=value)) + 
  geom_bar(position="dodge") + facet_grid(~Year)

我正在努力解决的是如何添加列标签.具体来说,我想知道如何添加基本频率计数以及百分比.不是两者都在一起,而是其中之一.我不能使任何工作.我试过使用 "+geom_text(aes(labels=") 但我不知道该放什么作为标签,因为我在 ggplot 代码中使用了 stat="identity".

What I'm struggling with is how to add column labels. Specifically I would like to know how to add basic frequency counts, as well as percentages. Not both together, but one or the other. I can't make either work. I've tried using "+geom_text(aes(labels=" but I'm not sure what to put as the label since I used stat="identity" in the ggplot code.

另外,对于百分比,我需要先用 dplyr 计算,还是可以在 ggplot 代码中计算百分比?我也不太了解 R 中的标签,所以不确定如何添加实际的 % 符号.

Also, for percentages, do I need to calculate it with dplyr first, or can I calculate the percentages within the ggplot code? I also don't know enough about labels in R, so not sure about how to add the actual % sign.

希望有人能告诉我实现这一切的基本方法!

Hoping someone can show me a basic way to achieve all this!

推荐答案

您可以使用 stat_countgeom="text" 将计数添加为文本...count..ggplot 创建的用于保存计数值的内部变量.下面的示例显示了如何使用 stat_count 来添加计数和百分比,当然您也可以选择仅包含其中之一.

You can add counts as text using stat_count with geom="text". ..count.. is the internal variable that ggplot creates to hold the count values. The example below shows how to add both counts and percentages using stat_count, though you can, of course, choose to include only one of them.

stat="identity"aes 中不做任何事情.您通常会将它放在 geom 中.但在这种情况下,您不希望 stat="identity" 因为您实际上希望 ggplot 计算每个类别中的值的数量.如果您使用的数据框的列已包含每个类别的计数,则可以将 stat="identity"geom_bar 一起使用.

stat="identity" doesn't do anything inside aes. You would normally put it inside the geom. But in this case you don't want stat="identity" because you actually want ggplot to count the number of values in each category. You would use stat="identity" with geom_bar if you were using a data frame with a column that already contained the counts for each category.

要创建标签文本,请使用 paste0 组合计算值(例如,..count../sum(..count..)*100 是百分比),带有类似 % 符号的文本.此外,在这种情况下,我使用换行符 将百分比和计数放在单独的行上.sprintf 是一个格式化函数,在这种情况下,它产生四舍五入到一位小数的值.1

To create the label text, use paste0 to combine the calculated values (e.g., ..count../sum(..count..)*100 is the percentage) with text like the % sign. Also, in this case I've used the newline character to put the percentage and count on separate lines. sprintf is a formatting function that in this case produces values rounded to one decimal place.1

ggplot(ExampleM, aes(x=variable, fill=value)) + 
  geom_bar(position="dodge") + 
  stat_count(aes(label=paste0(sprintf("%1.1f", ..count../sum(..count..)*100),
                              "%
", ..count..), y=0.5*..count..), 
             geom="text", colour="white", size=4, position=position_dodge(width=1)) +
  facet_grid(~Year)

这里有一个示例,您预先汇总数据并在绘制数据时使用 stat="identity":假设百分比不是所有值的百分比,而是每个季度的百分比.让我们也堆叠条形并将百分比作为文本添加到条形中:

Here's an example where you pre-summarize the data and use stat="identity" when plotting it: Say that instead of the percentages being the percent of all values, you want percentages within each quarter. Let's also stack the bars and add the percentages to the bars as text:

首先,创建数据摘要.我们将使用 dplyr 以便我们可以使用链接 (%>%) 运算符.我们将计算值的数量,计算 Yearvariable 的每个组合内的百分比,我们还将添加 n.pos 以提供堆叠条形图中文本位置的 y 值.

First, create the data summary. We'll use dplyr so that we can use the chaining (%>%) operator. We'll count the number of values, calculate percentages within each combination of Year and variable and we'll also add n.pos to provide y-values for the text location in a stacked bar plot.

library(dplyr)

summary = ExampleM %>% group_by(Year, variable, value) %>%
  tally %>%
  group_by(Year, variable) %>%
  mutate(pct = n/sum(n),
         n.pos = cumsum(n) - 0.5*n)

现在是情节.请注意,我们提供 y=n.由于我们已经预先汇总了数据(而不是在 geom_bar 内计算计数和百分比),因此我们需要 stat="identity".

Now for the plot. Note that we supply y=n. Since we've pre-summarized the data (rather than having counts and percentages calculated inside geom_bar) we need stat="identity".

ggplot(summary, aes(x=variable, y=n, fill=value)) +
  geom_bar(stat="identity") +
  facet_grid(.~Year) + 
  geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%"), y=n.pos), 
            colour="white") 

1 你可以使用 round 代替,但我更喜欢 sprintf 因为即使小数部分是零,而 round 在小数部分为零时仅返回整数部分.例如,比较 round(3.04, 1)sprintf("%1.1f", 3.04)

1 You can use round instead, but I prefer sprintf because it keeps a zero in the decimal place even when the decimal part is zero, while round returns just the integer part when the decimal part is zero. For example, compare round(3.04, 1) and sprintf("%1.1f", 3.04)

更新:要回答您评论中的问题:

UPDATE: To answer the questions in your comments:

  1. 第二个group_by 行"的原因是什么?我们已经计算了 Year、variable 和 value 的每个组合的计数.现在,我们想知道,在 Year 和变量的每个组合中,什么百分比具有 value="Satisfied",什么百分比具有 value="Dissatisfied".为此,我们只想按年份和变量分组.

  1. What's the reason for the second "group_by line"? We've calculated counts for each combination of Year, variable, and value. Now, we want to know, within each combination of Year and variable, what percent had value="Satisfied" and what percent had value="Dissatisfied". For that, we only want to group by Year and variable.

解释 y=n.pos 行.这是我们计算每个百分比标签的 y 位置的地方.我们希望标签位于每个条形的中间,但条形堆叠在一起.如果我们只使用 cumsum(n) 标签将位于每个条形部分的顶部.我们减去 0.5*n 以便每个标签的 y 位置将减少包含该标签的条形部分高度的一半.

Explain the y=n.pos line. This is where we calculate the y-position for each percent label. We want the label in the middle of each bar, but the bars are stacked. If we used just cumsum(n) the labels would be at the top of each bar section. We subtract 0.5*n so that the y-position of each label will be reduced by half the height of the bar section containing that label.

这是一个例子:假设我们有三个高度为 1、2 和 3 的条形部分(按此顺序从下到上堆叠),我们想要计算标签的 y 位置.

Here's an example: Say we have three bar sections with heights 1, 2, and 3 (stacked from bottom to top in that order) and we want to calculate the y-positions for our labels.

h = 1:3
cumsum(h) # 1 3 6
0.5 * h   # 0.5 1.0 1.5
cumsum(h) - 0.5 * h  # 0.5 2.0 4.5

这给出了在每个条形部分中垂直居中标签的 y 位置.

This gives y-positions that vertically center the label within each bar section.

如何按百分比降序排列 x 轴列?默认情况下,ggplot 通过 x 变量的类别排序对离散的 x 轴进行排序.对于字符变量,按字母顺序排列.对于因子变量,排序将是因子水平的排序.

How I can order the x-axis columns in descending order of percentages? By default, ggplot orders a discrete x-axis by the ordering of the categories of x variable. For a character variable, the ordering will be alphabetic. For a factor variable, the ordering will be the ordering of the levels of the factor.

在我的例子中,summary$variable的级别如下:

In my example, the levels of summary$variable are as follows:

levels(summary$variable)
[1] "Q1" "Q2"

要通过 pct 重新排序,一种方法是使用 reorder 函数.比较这些(使用上面的摘要数据框):

To reorder by pct, one way would be with the reorder function. Compare these (using the summary data frame from above):

summary$pct2 = summary$pct + c(0.3, -0.15, -0.45, -0.4, -0.1, -0.2, -0.15, -0.1)

ggplot(summary, aes(x=variable, y=pct2, fill=value)) +
  geom_bar(position="stack", stat="identity") +
  facet_grid(~Year) 

ggplot(summary, aes(x=reorder(variable, pct2), y=pct2, fill=value)) +
  geom_bar(position="stack", stat="identity") +
  facet_grid(~Year) 

请注意,在第二个图中,Q1"和Q2"的顺序现在颠倒了.但是,请注意在左侧面板中,Q1 堆栈更高,而在右侧面板中,Q2 堆栈更高.通过分面,您可以在每个面板中获得相同的 x 轴排序,通过比较所有 Q1 值的 sumsum 来确定顺序(据我所知)> 所有 Q2 值.Q2 的总和较小,所以他们先走.当您使用 position="dodge" 时也会发生同样的情况,但我使用了stack"来更容易看到发生了什么.下面的例子有望帮助澄清事情.

Notice that in the second plot, the order of "Q1" and "Q2" has now reversed. However, notice in the left panel, the Q1 stack is taller while in the right panel, the Q2 stack is taller. With faceting you get the same x-axis ordering in each panel, with the order determined (as far as I can tell) by comparing the sum of all Q1 values and the sum of all Q2 values. The sum of Q2 is smaller, so they go first. The same happens when you use position="dodge", but I used "stack" to make it easier to see what's happening. The examples below will hopefully help clarify things.

# Fake data
values = c(4.5,1.5,2,1,2,4)
dat = data.frame(group1=rep(letters[1:3], 2), group2=LETTERS[1:6], 
                 group3=rep(c("W","Z"),3), pct=values/sum(values))

levels(dat$group2)
[1] "A" "B" "C" "D" "E" "F"

# plot group2 in its factor order
ggplot(dat, aes(group2, pct)) +
  geom_bar(stat="identity", position="stack", colour="red", lwd=1)

# plot group2, ordered by -pct
ggplot(dat, aes(reorder(group2, -pct), pct)) +
  geom_bar(stat="identity", colour="red", lwd=1)

# plot group1 ordered by pct, with stacking
ggplot(dat, aes(reorder(group1, pct), pct)) +
  geom_bar(stat="identity", position="stack", colour="red", lwd=1) 

# Note that in the next two examples, the x-axis order is b, a, c, 
# regardless of whether you use faceting
ggplot(dat, aes(reorder(group1, pct), pct)) +
  geom_bar(stat="identity", position="stack", colour="red", lwd=1) +
  facet_grid(.~group3) 

ggplot(dat, aes(reorder(group1, pct), pct, fill=group3)) +
  geom_bar(stat="identity", position="stack", colour="red", lwd=1) 

有关通过设置因子顺序对轴值进行排序的更多信息,这篇博文 可能会有所帮助.

For more on ordering axis values by setting factor orders, this blog post might be helpful.

这篇关于将百分比添加到 GGplot2 中的分组条形图列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆