直方图ggplot:显示每个类别的每个箱的计数标签 [英] Histogram ggplot : Show count label for each bin for each category

查看:229
本文介绍了直方图ggplot:显示每个类别的每个箱的计数标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将使用ggplot中的钻石数据集来说明我的观点,我想为价格绘制直方图,但是我想为每个切割显示每个bin的计数
这是我的代码

p>

  ggplot(aes(x = price),data = diamonds_df)+ 
geom_histogram(aes(fill = cut,binwidth = 1500))+
stat_bin(binwidth = 1500,geom =text,aes(label = .. count ..),
vjust = -1)+
scale_x_continuous(breaks = seq (0,max(stores_1_5 $ Weekly_Sales),1500)
,labels = comma)

这里是我目前的情节





但是如您所看到的数字显示每个bin上所有剪辑的计数,我想显示每个bin上每个剪辑的计数。



也是一个奖励点,如果我可以配置Y轴而不是在步骤5000中显示数字,我可以手动配置

解决方案

更新为 ggplot2 2.x



您现在可以居中不使用 position = position_stack(vjust = 0.5)预先汇总数据,而是使用堆叠条形图中的标签。例如:

  ggplot(aes(x = price),data = diamonds)+ 
geom_histogram(aes(fill = cut),binwidth = 1500,color =grey20,lwd = 0.2)+
stat_bin(binwidth = 1500,geom =text,color =white,size = 3.5,
aes (0,max(钻石$价格),1500))
(标签= ..计数,组=切),位置= position_stack(vjust = 0.5))+
scale_x_continuous



原始答案



您可以获得每个值的计数通过添加 cut 作为 stat_bin 。我还在 aes 之外移动了 binwidth ,这导致 binwidth 在您的原始代码中被忽略:

  ggplot(aes(x = price),data = diamonds)+ 
geom_histogram(aes(fill = cut),binwidth = 1500,color =grey20,lwd = 0.2)+
stat_bin(binwidth = 1500,geom =text,color =white,size = 3.5 ,
aes(label = .. count ..,group = cut,y = 0.8 *(.. count ..)))+
scale_x_continuous(breaks = seq(0,max(diamonds $ price ),1500))


上面的代码有一个问题,我希望标签在每个小节内垂直居中部分,但我不知道如何在 stat_bin 内做到这一点,或者甚至可能。乘以0.8(或其他)将每个标签移动一个不同的相对量。所以,为了让标签居中,我在下面的代码中为标签创建了一个单独的数据框:

 #创建文本标签
dat = diamonds%>%
group_by(cut,
price = cut(price,seq(0,max(diamonds $ price)+1500,1500),
labels = seq(0,max(diamonds $ price),1500),right = FALSE))%>%
summary(count = n())%>%
group_by(price)%> ;%
mutate(ypos = cumsum(count) - 0.5 * count)%>%
ungroup()%>%
mutate(price = as.numeric(as.character价格))+ 750)

ggplot(aes(x = price),data = diamonds)+
geom_histogram(aes(fill = cut),binwidth = 1500,color =grey20 ,lwd = 0.2)+
geom_text(data = dat,aes(label = count,y = ypos),color =white,size = 3.5)
pre>



要在y轴上配置中断,只需添加 scale_y_continuous(breaks = seq(0,20) 000,2000))或任何你想要的休息。


I'll use the diamond data set in ggplot to illustrate my point , I want to draw a histogram for price , but I want to show the count for each bin for each cut this is my code

ggplot(aes(x = price ) , data = diamonds_df) + 
geom_histogram(aes(fill = cut , binwidth = 1500)) +
stat_bin(binwidth= 1500, geom="text", aes(label=..count..) , 
vjust = -1) + 
scale_x_continuous(breaks = seq(0 , max(stores_1_5$Weekly_Sales) , 1500 ) 
, labels = comma)

here is my current plot

but as you see the number shows the count for all cuts at each bin , I want to display the count for each cut on each bin .

also a bonus point if if I would be able to configure Y axis instead of displaying numbers at step of 5000 to something else I can configure manually

解决方案

Update for ggplot2 2.x

You can now center labels within stacked bars without pre-summarizing the data using position=position_stack(vjust=0.5). For example:

ggplot(aes(x = price ) , data = diamonds) + 
  geom_histogram(aes(fill=cut), binwidth=1500, colour="grey20", lwd=0.2) +
  stat_bin(binwidth=1500, geom="text", colour="white", size=3.5,
           aes(label=..count.., group=cut), position=position_stack(vjust=0.5)) +
  scale_x_continuous(breaks=seq(0,max(diamonds$price), 1500))

Original Answer

You can get the counts for each value of cut by adding cut as a group aesthetic to stat_bin. I also moved binwidth outside of aes, which was causing binwidth to be ignored in your original code:

ggplot(aes(x = price ), data = diamonds) + 
  geom_histogram(aes(fill = cut ), binwidth=1500, colour="grey20", lwd=0.2) +
  stat_bin(binwidth=1500, geom="text", colour="white", size=3.5,
           aes(label=..count.., group=cut, y=0.8*(..count..))) +
  scale_x_continuous(breaks=seq(0,max(diamonds$price), 1500))

One issue with the code above is that I'd like the labels to be vertically centered within each bar section, but I'm not sure how to do that within stat_bin, or if it's even possible. Multiplying by 0.8 (or whatever) moves each label by a different relative amount. So, to get the labels centered, I created a separate data frame for the labels in the code below:

# Create text labels
dat = diamonds %>% 
  group_by(cut, 
           price=cut(price, seq(0,max(diamonds$price)+1500,1500),
                     labels=seq(0,max(diamonds$price),1500), right=FALSE)) %>%
  summarise(count=n()) %>%
  group_by(price) %>%
  mutate(ypos = cumsum(count) - 0.5*count) %>%
  ungroup() %>%
  mutate(price = as.numeric(as.character(price)) + 750)

ggplot(aes(x = price ) , data = diamonds) + 
  geom_histogram(aes(fill = cut ), binwidth=1500, colour="grey20", lwd=0.2) +
  geom_text(data=dat, aes(label=count, y=ypos), colour="white", size=3.5)

To configure the breaks on the y axis, just add scale_y_continuous(breaks=seq(0,20000,2000)) or whatever breaks you'd like.

这篇关于直方图ggplot:显示每个类别的每个箱的计数标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆