在分类变量图表中显示百分比,而不是计数 [英] Show percent % instead of counts in charts of categorical variables

查看:155
本文介绍了在分类变量图表中显示百分比,而不是计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在绘制一个类别变量,而不是显示每个类别值的计数.

I'm plotting a categorical variable and instead of showing the counts for each category value.

我正在寻找一种方法来获取ggplot以显示该类别中值的百分比.当然,可以使用计算出的百分比创建另一个变量并绘制该变量,但我必须执行数十次,我希望可以通过一个命令来实现.

I'm looking for a way to get ggplot to display the percentage of values in that category. Of course, it is possible to create another variable with the calculated percentage and plot that one, but I have to do it several dozens of times and I hope to achieve that in one command.

我正在尝试类似

qplot(mydataf) +
  stat_bin(aes(n = nrow(mydataf), y = ..count../n)) +
  scale_y_continuous(formatter = "percent")

但是由于出现错误,我必须使用不正确.

but I must be using it incorrectly, as I got errors.

为轻松重现设置,这是一个简化的示例:

To easily reproduce the setup, here's a simplified example:

mydata <- c ("aa", "bb", NULL, "bb", "cc", "aa", "aa", "aa", "ee", NULL, "cc");
mydataf <- factor(mydata);
qplot (mydataf); #this shows the count, I'm looking to see % displayed.

在实际情况下,我可能会使用ggplot而不是qplot,但是使用 stat_bin 仍然使我难以理解.

In the real case, I'll probably use ggplot instead of qplot, but the right way to use stat_bin still eludes me.

我也尝试了以下四种方法:

I've also tried these four approaches:

ggplot(mydataf, aes(y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent');

ggplot(mydataf, aes(y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent') + geom_bar();

ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent');

ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent') + geom_bar();

但所有4个都给出:

Error: ggplot2 doesn't know how to deal with data of class factor

对于

ggplot (data=mydataf, aes(levels(mydataf))) +
  geom_bar()

所以显然ggplot如何与单个向量交互.我挠头,搜寻该错误会给出一个结果 .

so it's clearly something about how ggplot interacts with a single vector. I'm scratching my head, googling for that error gives a single result.

推荐答案

自从回答以来,对ggplot语法进行了一些有意义的更改.总结以上评论中的讨论:

Since this was answered there have been some meaningful changes to the ggplot syntax. Summing up the discussion in the comments above:

 require(ggplot2)
 require(scales)

 p <- ggplot(mydataf, aes(x = foo)) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        ## version 3.0.0
        scale_y_continuous(labels=percent)

这是使用mtcars的可重现示例:

Here's a reproducible example using mtcars:

 ggplot(mtcars, aes(x = factor(hp))) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        scale_y_continuous(labels = percent) ## version 3.0.0

这个问题目前在Google上的"ggplot计数与直方图百分比"排名中排名第一.因此希望可以帮助提炼出当前包含在对已接受答案的评论中的所有信息.

This question is currently the #1 hit on google for 'ggplot count vs percentage histogram' so hopefully this helps distill all the information currently housed in comments on the accepted answer.

备注:如果未将hp设置为因子,则ggplot返回:

Remark: If hp is not set as a factor, ggplot returns:

这篇关于在分类变量图表中显示百分比,而不是计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆