ggplot2 - 具有组内比例而非频率的多组直方图 [英] ggplot2 - Multi-group histogram with in-group proportions rather than frequency

查看:607
本文介绍了ggplot2 - 具有组内比例而非频率的多组直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有三个由 ExperimentCohort 因子标识的学生群组。对于每个学生,我都有一个 LetterGrade ,这也是一个因素。我想为每个 ExperimentCohort 绘制一个直方图状的 LetterGrade 条形图。使用

  ggplot(df,alpha = 0.2,
aes(x = LetterGrade,group = ExperimentCohort,fill = ExperimentCohort ))
+ geom_bar(position =dodge)

让我非常接近,但三个 ExperimentCohorts 没有相同的学生人数。为了在一个更均匀的领域比较这些,我想Y轴是每个字母级别的队列中的比例。到目前为止,没有计算这个比例并在绘图之前将它放在一个单独的数据框中,我还没有找到办法做到这一点。

有关SO和类似问题的每个解决方案都涉及 aes(y = ..count ../ sum(.. count ..) ),但sum(.. count ..)在整个数据框中执行,而不是在每个队列中执行。任何人都有一个建议?以下是创建示例数据框的代码:

  df < -  data.frame(ID = 1:60,
LetterGrade =样品(c(A,B,C,D,E,F),60,replace = T),
ExperimentCohort = sample(c ,Two,Three),60,replace = T))

谢谢。 / b>

解决方案

错误的解决方案



您可以使用 stat_bin() y = .. density .. 以获得每个组的百分比。

  ggplot(df,alpha = 0.2,
aes(x = LetterGrade,group = ExperimentCohort,fill = ExperimentCohort))+
stat_bin(aes(y = ..density ..),position ='dodge')



更新 - 正确的解决方案



正如@rpierce y = .. density .. 指出的那样,将计算每个组的密度值,而不是百分比(它们是不一样)。

为了用百分比得到正确的解决方法,一种方法是在绘图之前计算它们。对于这个使用函数 ddply()来自库 plyr 。在每个 ExperimentCohort 使用函数 prop.table() table() code>并将它们保存为 prop 。使用 names() table()返回 LetterGrade

  df.new <-ddply(df,。(ExperimentCohort),summary,
prop = prop.table (表格(LetterGrade)),
LetterGrade =姓名(表格(LetterGrade)))

head(df.new)
ExperimentCohort支票LetterGrade
1一个0.21739130 A
2一个0.08695652 B
3一个0.13043478 C
4一个0.13043478 D
5一个0.30434783 E
6一个0.13043478 F

现在使用这个新的数据框进行绘图。由于比例已经计算出来 - 在<$ c $内提供了 y 值并添加了 stat =identity c> geom_bar

  ggplot(df.new,aes(LetterGrade,prop,fill = ExperimentCohort ))+ 
geom_bar(stat =identity,position ='dodge')


I have three cohorts of students identified by an ExperimentCohort factor. For each student, I have a LetterGrade, also a factor. I'd like to plot a histogram-like bar graph of LetterGrade for each ExperimentCohort. Using

ggplot(df, alpha = 0.2, 
       aes(x = LetterGrade, group = ExperimentCohort, fill = ExperimentCohort))                                                                                                                                                       
  + geom_bar(position = "dodge")

gets me very close, but the three ExperimentCohorts don't have the same number of students. To compare these on a more even field, I'd like the y-axis to be the in-cohort proportion of each letter-grade. So far, short of calculating this proportion and putting it in a separate dataframe before plotting, I have not been able to find a way to do this.

Every solution to a similar question on SO and elsewhere involves aes(y = ..count../sum(..count..)), but sum(..count..) is executed across the whole dataframe rather than within each cohort. Anyone got a suggestion? Here's code to create an example dataframe:

df <- data.frame(ID = 1:60, 
        LetterGrade = sample(c("A", "B", "C", "D", "E", "F"), 60, replace = T),
        ExperimentCohort = sample(c("One", "Two", "Three"), 60, replace = T))

Thanks.

解决方案

Wrong solution

You can use stat_bin() and y=..density.. to get percentages in each group.

ggplot(df, alpha = 0.2,
      aes(x = LetterGrade, group = ExperimentCohort, fill = ExperimentCohort))+
      stat_bin(aes(y=..density..), position='dodge')

UPDATE - correct solution

As pointed out by @rpierce y=..density.. will calculate density values for each group not the percentages (they are not the same).

To get the correct solution with percentages one way is to calculate them before plotting. For this used function ddply() from library plyr. In each ExperimentCohort calculated proportions using functions prop.table() and table() and saved them as prop. With names() and table() got back LetterGrade.

df.new<-ddply(df,.(ExperimentCohort),summarise,
              prop=prop.table(table(LetterGrade)),
              LetterGrade=names(table(LetterGrade)))

 head(df.new)
  ExperimentCohort       prop LetterGrade
1              One 0.21739130           A
2              One 0.08695652           B
3              One 0.13043478           C
4              One 0.13043478           D
5              One 0.30434783           E
6              One 0.13043478           F

Now use this new data frame for plotting. As proportions are already calculated - provided them as y values and added stat="identity" inside the geom_bar.

ggplot(df.new,aes(LetterGrade,prop,fill=ExperimentCohort))+
  geom_bar(stat="identity",position='dodge')

这篇关于ggplot2 - 具有组内比例而非频率的多组直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆