在分类变量的图表中显示%而不是计数 [英] Show % instead of counts in charts of categorical variables

查看:140
本文介绍了在分类变量的图表中显示%而不是计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在绘制一个分类变量,而不是显示每个分类值的计数。



我正在寻找一种方法来获取 ggplot 来显示该类别中值的百分比。当然,可以用计算出的百分比创建另一个变量并绘制一个变量,但我必须做几十次,我希望能够在一个命令中实现。



我正在试验类似于

  qplot(mydataf)+ 
stat_bin(aes(n = nrow (mydataf),y = ..count ../ n))+
scale_y_continuous(formatter =percent)

但我必须错误地使用它,因为我有错误。



为了方便地重现设置,这里有一个简单的例子:



$ p $ mydata <-c(aa,bb,null,bb,cc,aa,aa ,aa,ee,null,cc);
mydataf < - factor(mydata);
qplot(mydataf); #this显示计数,我期待看到%显示。

在真实情况下,我可能会使用 ggplot 而不是 qplot ,但正确的方式使用



这个问题目前是#1命中谷歌f或'ggplot计数与百分比直方图',所以希望这可以帮助提取目前收集到的答案中的所有信息。
$ b 备注:如果 hp 未设置为因子,ggplot返回:


I'm plotting a categorical variable and instead of showing the counts for each category value.

I'm looking for a way to get ggplot to display the percentage of values in that category. Of course, it is possible to create another variable with the calculated percentage and plot that one, but I have to do it several dozens of times and I hope to achieve that in one command.

I was experimenting with something like

qplot(mydataf) +
  stat_bin(aes(n = nrow(mydataf), y = ..count../n)) +
  scale_y_continuous(formatter = "percent")

but I must be using it incorrectly, as I got errors.

To easily reproduce the setup, here's a simplified example:

mydata <- c ("aa", "bb", null, "bb", "cc", "aa", "aa", "aa", "ee", null, "cc");
mydataf <- factor(mydata);
qplot (mydataf); #this shows the count, I'm looking to see % displayed.

In the real case I'll probably use ggplot instead of qplot, but the right way to use stat_bin still eludes me.

I've also tried these four approaches:

ggplot(mydataf, aes(y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent');

ggplot(mydataf, aes(y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent') + geom_bar();

ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent');

ggplot(mydataf, aes(x = levels(mydataf), y = (..count..)/sum(..count..))) + 
  scale_y_continuous(formatter = 'percent') + geom_bar();

but all 4 give:

Error: ggplot2 doesn't know how to deal with data of class factor

The same error appears for the simple case of

ggplot (data=mydataf, aes(levels(mydataf))) +
  geom_bar()

so it's clearly something about how ggplot interacts with a single vector. I'm scratching my head, googling for that error gives a single result.

解决方案

Since this was answered there have been some meaningful changes to the ggplot syntax. Summing up the discussion in the comments above:

 require(ggplot2)
 require(scales)

 p <- ggplot(mydataf, aes(x = foo)) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        ## version 3.0.9
        # scale_y_continuous(labels = percent_format())
        ## version 3.1.0
        scale_y_continuous(labels=percent)

Here's a reproducible example using mtcars:

 ggplot(mtcars, aes(x = factor(hp))) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        ## scale_y_continuous(labels = percent_format()) #version 3.0.9
        scale_y_continuous(labels = percent) #version 3.1.0

This question is currently the #1 hit on google for 'ggplot count vs percentage histogram' so hopefully this helps distill all the information currently housed in comments on the accepted answer.

Remark: If hp is not set as a factor, ggplot returns:

这篇关于在分类变量的图表中显示%而不是计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆