在分类变量图表中显示百分比,而不是计数 [英] Show percent % instead of counts in charts of categorical variables
问题描述
我正在绘制一个类别变量,而不是显示每个类别值的计数.
I'm plotting a categorical variable and instead of showing the counts for each category value.
我正在寻找一种方法来获取ggplot
以显示该类别中值的百分比.当然,可以使用计算出的百分比创建另一个变量并绘制该变量,但我必须执行数十次,我希望可以通过一个命令来实现.
I'm looking for a way to get ggplot
to display the percentage of values in that category. Of course, it is possible to create another variable with the calculated percentage and plot that one, but I have to do it several dozens of times and I hope to achieve that in one command.
我正在尝试类似
qplot(mydataf) +
stat_bin(aes(n = nrow(mydataf), y = ..count../n)) +
scale_y_continuous(formatter = "percent")
但是由于出现错误,我必须使用不正确.
but I must be using it incorrectly, as I got errors.
为轻松重现设置,这是一个简化的示例:
To easily reproduce the setup, here's a simplified example:
mydata <- c ("aa", "bb", NULL, "bb", "cc", "aa", "aa", "aa", "ee", NULL, "cc");
mydataf <- factor(mydata);
qplot (mydataf); #this shows the count, I'm looking to see % displayed.
在实际情况下,我可能会使用ggplot
而不是qplot
,但是使用 stat_bin 仍然使我难以理解.
In the real case, I'll probably use ggplot
instead of qplot
, but the right way to use stat_bin still eludes me.
我也尝试了以下四种方法:
I've also tried these four approaches:
ggplot(mydataf, aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(formatter = 'percent');
ggplot(mydataf, aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(formatter = 'percent') + geom_bar();
ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) +
scale_y_continuous(formatter = 'percent');
ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) +
scale_y_continuous(formatter = 'percent') + geom_bar();
但所有4个都给出:
Error: ggplot2 doesn't know how to deal with data of class factor
对于
ggplot (data=mydataf, aes(levels(mydataf))) +
geom_bar()
所以显然ggplot
如何与单个向量交互.我挠头,搜寻该错误会给出一个结果 .
so it's clearly something about how ggplot
interacts with a single vector. I'm scratching my head, googling for that error gives a single result.
推荐答案
自从回答以来,对ggplot
语法进行了一些有意义的更改.总结以上评论中的讨论:
Since this was answered there have been some meaningful changes to the ggplot
syntax. Summing up the discussion in the comments above:
require(ggplot2)
require(scales)
p <- ggplot(mydataf, aes(x = foo)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
## version 3.0.0
scale_y_continuous(labels=percent)
这是使用mtcars
的可重现示例:
Here's a reproducible example using mtcars
:
ggplot(mtcars, aes(x = factor(hp))) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(labels = percent) ## version 3.0.0
这个问题目前在Google上的"ggplot计数与直方图百分比"排名中排名第一.因此希望可以帮助提炼出当前包含在对已接受答案的评论中的所有信息.
This question is currently the #1 hit on google for 'ggplot count vs percentage histogram' so hopefully this helps distill all the information currently housed in comments on the accepted answer.
备注:如果未将hp
设置为因子,则ggplot返回:
Remark: If hp
is not set as a factor, ggplot returns:
这篇关于在分类变量图表中显示百分比,而不是计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!