将百分比添加到GGplot2中的分组条形图列中 [英] Adding Percentages to a Grouped Barchart Columns in GGplot2

查看:1516
本文介绍了将百分比添加到GGplot2中的分组条形图列中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望有人可以帮我用百分比标记分组条形图的列。我找不到可以成功工作的现有帖子。以下是基本示例数据框的代码。 (AS,AS,PS,PS,RS,,$, (2015,2016,2015,2016,2015,2016,2015,ES,ES)

年份< (不满意,满意,满意,满意,不满意,不满意,满意,满意 )b
Q2 <-c(不满意,不满意,满意,不满意,不满意,满意,满意,满意)

示例< -data.frame(Service,Year,Q1,Q2)

我使用Reshape2对其进行了熔化处理,以便我可以沿着x轴绘制Q1和Q2列变量。然后我用ggplot2创建了一个基本的分组条形图,其中y轴计数,然后是每年的一个方面。

  ExampleM <-melt(Example,id.vars = c(Service,Year))

ggplot(ExampleM,aes(x = variable,stat =identity,fill = value))+
geom_bar(position =dodge)+ facet_grid(〜Year)

我正在努力的是如何添加列标签。具体而言,我想知道如何添加基本频率计数以及百分比。不是两个在一起,而是在一个或另一个。我无法做任何工作。我尝试过使用+ geom_text(aes(labels =,但我不确定把ggplot代码中的stat =identity作为标签放在哪里。



另外,对于百分比,我需要先用dplyr来计算它,还是可以计算ggplot代码中的百分比?我对R中的标签也不够了解,所以不确定如何添加实际的%符号。



希望有人能告诉我一个基本的方法来实现这一切!

解决方案

您可以使用 stat_count 加上 geom =text .. count .. ggplot 创建的内部变量来保存计数值。下面的示例显示了如何添加两个使用 stat_count 的计数和百分比,尽管您当然可以选择只包括其中的一个。



stat =identity aes 内不会做任何事情,你通常会把它放在geom中。 t在这种情况下,您不希望 stat =identity,因为您实际上希望 ggplot 来计算每个类别的值。如果您使用的数据框的列已经包含了一个数据框,您可以使用 stat =identity geom_bar 为每个类别计数。



要创建标签文本,请使用 paste0 来合并计算的值(例如 .. count ../ sum(.. count ..)* 100 是百分比),文字如标志。此外,在这种情况下,我使用换行符 \\\
将百分比和计数放在单独的行上。 sprintf 是一种格式化函数,在这种情况下,会生成四舍五入至小数点后一位的值。

  ggplot(ExampleM,aes(x = variable,fill = value))+ 
geom_bar(position =dodge)+
stat_count(aes(label = paste0(sprintf(%1.1f,..count ../ sum(.. count ..)* 100),
%\\\
,..count ..),y = 0.5 * .. count ..),
geom =text,color =white,size = 4,position = position_dodge(width = 1))+
facet_grid(〜Year)



以下是一个预先总结数据的例子,在绘制数据时使用 stat =identity:假设您不用百分比作为所有值的百分比,希望每个季度的百分比。让我们叠加条形图并将百分比作为文本添加到条形图中:首先,创建数据摘要。我们将使用 dplyr ,以便我们可以使用链接(%>%)运算符。我们将计算值的数量,计算每个 Year 变量组合中的百分比,然后我们将添加 n.pos 为堆叠条形图中的文本位置提供y值。

  library(dplyr)

summary = ExampleM%>%group_by(Year,variable,value)%>%
tally%>%
group_by (年,变量)%>%
mutate(pct = n / sum(n),
n.pos = cumsum(n) - 0.5 * n)

现在为情节。请注意,我们提供 y = n 。由于我们已经预先汇总了数据(而不是在 geom_bar 内计算的计数和百分比),所以我们需要 stat =identity
$ $ p $ g $ p $ ggplot(aes(x = variable,y = n,fill = value))+
geom_bar(stat =identity)+
facet_grid(。〜Year)+
geom_text(aes(label = paste0(sprintf(%1.1f,pct * 100),% ),y = n.pos),
color =white)



1 您可以使用 round ,但我更喜欢 sprintf ,因为即使小数部分为零,小数位仍为零,而 round 当小数部分为零时仅返回整数部分。例如,比较 round(3.04,1) sprintf(%1.1f,3.04)



更新:要回答您评论中的问题:


  1. 第二个group_by line的原因是什么?我们已经计算了Year,Variable和Value的每个组合的计数。现在,我们想知道,在年份和变量的每个组合中,百分之多少的价值=满意,百分之几的价值=不满意。为此,我们只想按Year和变量分组。

  2. 解释 y = n.pos 行。这是我们计算每个百分比标签的y位置的地方。我们希望标签位于每个酒吧的中间,但酒吧是堆叠的。如果我们仅使用 cumsum(n),标签将位于每个小节的顶部。我们减去 0.5 * n ,这样每个标签的y位置就会被包含该标签的横条部分的高度减半。



    下面是一个例子:假设我们有三个高度为1,2和3的条形截面(按照从下到上的顺序堆叠),我们要计算y

      h = 1:3 
    cumsum(h)#1 3 6
    0.5 * h#0.5 1.0 1.5
    cumsum(h) - 0.5 * h#0.5 2.0 4.5



    我如何按百分比递减的顺序排列x轴列?默认情况下,ggplot按照 x 变量类别的顺序排列离散x轴。对于字符变量,排序将按字母顺序排列。



    在我的示例中,摘要$ variable

    的级别是一个因子变量, / code>如下:

      levels(汇总$变量)
    [1]Q1 Q2

    要按 pct 重新排序,一种方法是使用 reorder 函数。比较这些数据(使用上面的摘要数据框):

    pre $ summary $ pct2 = summary $ pct + c(0.3,-0.15 ,-0.45,-0.4,-0.1,-0.2,-0.15,-0.1)

    ggplot(summary,aes(x = variable,y = pct2,fill = value))+
    geom_bar(position =stack,stat =identity)+
    facet_grid(〜Year)

    ggplot(summary,aes(x = reorder(variable,pct2),y = pct2,fill = value))+
    geom_bar(position =stack,stat =identity)+
    facet_grid(〜Year)

    请注意,在第二张图中,Q1和Q2的顺序现在已经颠倒过来。但是,请注意左侧面板中的Q1堆栈较高,而在右侧面板中,Q2堆栈较高。使用faceting,您可以在每个面板中获得相同的x轴排序,并通过比较所有Q1值和 sum sum 确定顺序(据我所知) >的所有Q2值。 Q2的总和较小,所以他们先走。当您使用 position =dodge时也会发生同样的情况,但我使用堆栈来更容易地看到发生了什么。

     #假数据
    values = c(4.5,1.5,2, 1,2,4)
    dat = data.frame(group1 = rep(letters [1:3],2),group2 = LETTERS [1:6],
    group3 = rep(c( B,C,Z,3),pct =值/总和(值))

    等级(dat $ group2)
    [1] DEF

    #以其因子顺序绘制group2
    ggplot(dat,aes(group2,pct))+
    geom_bar(stat =identity ,position =stack,color =red,lwd = 1)

    #plot group2,按-pct排序
    ggplot(dat,aes(reorder(group2,-pct ),pct))+
    geom_bar(stat =identity,color =red,lwd = 1)

    #p1排序的plot group1,堆叠
    ggplot (dat,aes(reorder(group1,pct),pct))+
    geom_bar(stat =identity,position =stack,color =red,lwd = 1)

    #请注意,在接下来的两个示例中,无论您是使用faceting
    ggplot(dat,aes(reorder(group1,pct)),x轴顺序为b,a,c,
    # ,pct))+
    g eom_bar(stat =identity,position =stack,color =red,lwd = 1)+
    facet_grid(。〜group3)

    ggplot(dat,aes(reorder (group1,pct),pct,fill = group3))+
    geom_bar(stat =identity,position =stack,color =red,lwd = 1)

    有关通过设置因子顺序排列轴值的更多信息,此博客文章可能会有所帮助。


Hoping someone can help me with labelling columns of a grouped barchart with percentages. I couldn't find an existing post that I could make work successfuly. Below is the code for a basic example dataframe.

Service<-c("AS","AS","PS","PS","RS","RS","ES","ES")

Year<-c("2015","2016","2015","2016","2015","2016","2015","2016")

Q1<-c("Dissatisfied","Satisfied","Satisfied","Satisfied","Dissatisfied","Dissatisfied","Satisfied","Satisfied")

Q2<-c("Dissatisfied","Dissatisfied","Satisfied","Dissatisfied","Dissatisfied","Satisfied","Satisfied","Satisfied")

Example<-data.frame(Service,Year,Q1,Q2)

Next, I melted it with Reshape2 so that I could plot the Q1 and Q2 column variables along the x-axis. I then created a basic grouped barchart with ggplot2, with counts on the y-axis, and then a facet by year.

ExampleM<-melt(Example,id.vars=c("Service","Year"))

ggplot(ExampleM,aes(x=variable,stat="identity",fill=value)) + 
  geom_bar(position="dodge") + facet_grid(~Year)

What I'm struggling with is how to add column labels. Specifically I would like to know how to add basic frequency counts, as well as percentages. Not both together, but one or the other. I can't make either work. I've tried using "+geom_text(aes(labels=" but I'm not sure what to put as the label since I used stat="identity" in the ggplot code.

Also, for percentages, do I need to calculate it with dplyr first, or can I calculate the percentages within the ggplot code? I also don't know enough about labels in R, so not sure about how to add the actual % sign.

Hoping someone can show me a basic way to achieve all this!

解决方案

You can add counts as text using stat_count with geom="text". ..count.. is the internal variable that ggplot creates to hold the count values. The example below shows how to add both counts and percentages using stat_count, though you can, of course, choose to include only one of them.

stat="identity" doesn't do anything inside aes. You would normally put it inside the geom. But in this case you don't want stat="identity" because you actually want ggplot to count the number of values in each category. You would use stat="identity" with geom_bar if you were using a data frame with a column that already contained the counts for each category.

To create the label text, use paste0 to combine the calculated values (e.g., ..count../sum(..count..)*100 is the percentage) with text like the % sign. Also, in this case I've used the newline character \n to put the percentage and count on separate lines. sprintf is a formatting function that in this case produces values rounded to one decimal place.1

ggplot(ExampleM, aes(x=variable, fill=value)) + 
  geom_bar(position="dodge") + 
  stat_count(aes(label=paste0(sprintf("%1.1f", ..count../sum(..count..)*100),
                              "%\n", ..count..), y=0.5*..count..), 
             geom="text", colour="white", size=4, position=position_dodge(width=1)) +
  facet_grid(~Year)

Here's an example where you pre-summarize the data and use stat="identity" when plotting it: Say that instead of the percentages being the percent of all values, you want percentages within each quarter. Let's also stack the bars and add the percentages to the bars as text:

First, create the data summary. We'll use dplyr so that we can use the chaining (%>%) operator. We'll count the number of values, calculate percentages within each combination of Year and variable and we'll also add n.pos to provide y-values for the text location in a stacked bar plot.

library(dplyr)

summary = ExampleM %>% group_by(Year, variable, value) %>%
  tally %>%
  group_by(Year, variable) %>%
  mutate(pct = n/sum(n),
         n.pos = cumsum(n) - 0.5*n)

Now for the plot. Note that we supply y=n. Since we've pre-summarized the data (rather than having counts and percentages calculated inside geom_bar) we need stat="identity".

ggplot(summary, aes(x=variable, y=n, fill=value)) +
  geom_bar(stat="identity") +
  facet_grid(.~Year) + 
  geom_text(aes(label=paste0(sprintf("%1.1f", pct*100),"%"), y=n.pos), 
            colour="white") 

1 You can use round instead, but I prefer sprintf because it keeps a zero in the decimal place even when the decimal part is zero, while round returns just the integer part when the decimal part is zero. For example, compare round(3.04, 1) and sprintf("%1.1f", 3.04)

UPDATE: To answer the questions in your comments:

  1. What's the reason for the second "group_by line"? We've calculated counts for each combination of Year, variable, and value. Now, we want to know, within each combination of Year and variable, what percent had value="Satisfied" and what percent had value="Dissatisfied". For that, we only want to group by Year and variable.

  2. Explain the y=n.pos line. This is where we calculate the y-position for each percent label. We want the label in the middle of each bar, but the bars are stacked. If we used just cumsum(n) the labels would be at the top of each bar section. We subtract 0.5*n so that the y-position of each label will be reduced by half the height of the bar section containing that label.

    Here's an example: Say we have three bar sections with heights 1, 2, and 3 (stacked from bottom to top in that order) and we want to calculate the y-positions for our labels.

    h = 1:3
    cumsum(h) # 1 3 6
    0.5 * h   # 0.5 1.0 1.5
    cumsum(h) - 0.5 * h  # 0.5 2.0 4.5
    

    This gives y-positions that vertically center the label within each bar section.

  3. How I can order the x-axis columns in descending order of percentages? By default, ggplot orders a discrete x-axis by the ordering of the categories of x variable. For a character variable, the ordering will be alphabetic. For a factor variable, the ordering will be the ordering of the levels of the factor.

    In my example, the levels of summary$variable are as follows:

    levels(summary$variable)
    [1] "Q1" "Q2"
    

    To reorder by pct, one way would be with the reorder function. Compare these (using the summary data frame from above):

    summary$pct2 = summary$pct + c(0.3, -0.15, -0.45, -0.4, -0.1, -0.2, -0.15, -0.1)
    
    ggplot(summary, aes(x=variable, y=pct2, fill=value)) +
      geom_bar(position="stack", stat="identity") +
      facet_grid(~Year) 
    
    ggplot(summary, aes(x=reorder(variable, pct2), y=pct2, fill=value)) +
      geom_bar(position="stack", stat="identity") +
      facet_grid(~Year) 
    

    Notice that in the second plot, the order of "Q1" and "Q2" has now reversed. However, notice in the left panel, the Q1 stack is taller while in the right panel, the Q2 stack is taller. With faceting you get the same x-axis ordering in each panel, with the order determined (as far as I can tell) by comparing the sum of all Q1 values and the sum of all Q2 values. The sum of Q2 is smaller, so they go first. The same happens when you use position="dodge", but I used "stack" to make it easier to see what's happening. The examples below will hopefully help clarify things.

    # Fake data
    values = c(4.5,1.5,2,1,2,4)
    dat = data.frame(group1=rep(letters[1:3], 2), group2=LETTERS[1:6], 
                     group3=rep(c("W","Z"),3), pct=values/sum(values))
    
    levels(dat$group2)
    [1] "A" "B" "C" "D" "E" "F"
    
    # plot group2 in its factor order
    ggplot(dat, aes(group2, pct)) +
      geom_bar(stat="identity", position="stack", colour="red", lwd=1)
    
    # plot group2, ordered by -pct
    ggplot(dat, aes(reorder(group2, -pct), pct)) +
      geom_bar(stat="identity", colour="red", lwd=1)
    
    # plot group1 ordered by pct, with stacking
    ggplot(dat, aes(reorder(group1, pct), pct)) +
      geom_bar(stat="identity", position="stack", colour="red", lwd=1) 
    
    # Note that in the next two examples, the x-axis order is b, a, c, 
    # regardless of whether you use faceting
    ggplot(dat, aes(reorder(group1, pct), pct)) +
      geom_bar(stat="identity", position="stack", colour="red", lwd=1) +
      facet_grid(.~group3) 
    
    ggplot(dat, aes(reorder(group1, pct), pct, fill=group3)) +
      geom_bar(stat="identity", position="stack", colour="red", lwd=1) 
    

    For more on ordering axis values by setting factor orders, this blog post might be helpful.

这篇关于将百分比添加到GGplot2中的分组条形图列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆