使用ggplot2从已经汇总的计数中堆积的直方图 [英] Stacked histogram from already summarized counts using ggplot2

查看:451
本文介绍了使用ggplot2从已经汇总的计数中堆积的直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要帮助着色从已经汇总的计数数据生成的ggplot2直方图。

这些数据就像生活在许多不同地区的男性和女性的数量一样。绘制总计数(即男性+女性)的柱状图很容易:

$ p $ set.seed(1)
N = 100;
X = data.frame(C1 = rnbinom(N,15,0.1),C2 = rnbinom(N,15,0.1),C = rep(0,N));
X $ C = X $ C1 + X $ C2;
ggplot(X,aes(x = C))+ geom_histogram()

我想根据C1和C2的相对贡献对每个小节着色,以便得到与上例中相同的直方图(即总体小节高度),再加上我看到类型C1和C2的比例个人在堆积的条形图中。



建议使用ggplot2做一个干净的方法,在例子中使用数据X?

解决方案

很快,您可以使用 stat =identity选项和

 库(plyr)
>手动计算柱状图(x,。(mid),总结,总长度=长度(C),分割=(x, sum(C1)/ sum(C)* length(C))

ggplot(data = X_plot)+ geom_histogram(aes(x = mid,y = total),fill =blue,stat =identity)+ geom_histogram(aes(x = mid,y = split),fill =deeppink,stat =identity)

我们基本上只是为如何定位列创建一个'mids'列,然后制作两个图:一个用总数(C)计数,另一个用列数调整到count其中一列(C1)。你应该可以从这里定制。





更新1 :我意识到我在计算mids时犯了一个小错误。现在修复。另外,我不知道为什么我使用'ddply'语句来计算中音。这很愚蠢。新代码更清晰,更简洁。

更新2 :我返回查看评论并注意到有些可怕的东西:作为直方图频率。我已经清理了一些代码,并且还添加了关于着色语法的注释中的建议。


I would like some help coloring a ggplot2 histogram generated from already-summarized count data.

The data are something like counts of # males and # females living in a number of different areas. It's easy enough to plot the histogram for the total counts (i.e. males + females):

set.seed(1)
N=100;
X=data.frame(C1=rnbinom(N,15,0.1), C2=rnbinom(N,15,0.1),C=rep(0,N)); 
X$C=X$C1+X$C2;
ggplot(X,aes(x=C)) + geom_histogram()

However, I'd like to color each bar according to the relative contribution from C1 and C2, so that I get the same histogram (i.e. overall bar heights) as in the above example, plus I see the proportion of type "C1" and "C2" individuals as in a stacked bar chart.

Suggestions for a clean way to do this with ggplot2, using data like "X" in the example?

解决方案

Very quickly, you can do what the OP wants using the stat="identity" option and the plyr package to manually calculate the histogram, like so:

library(plyr)

X$mid <- floor(X$C/20)*20+10
X_plot <- ddply(X, .(mid), summarize, total=length(C), split=sum(C1)/sum(C)*length(C))

ggplot(data=X_plot) + geom_histogram(aes(x=mid, y=total), fill="blue", stat="identity") + geom_histogram(aes(x=mid, y=split), fill="deeppink", stat="identity")

We basically just make a 'mids' column for how to locate the columns and then make two plots: one with the count for the total (C) and one with the columns adjusted to the count of one of the columns (C1). You should be able to customize from here.

Update 1: I realized I made a small error in calculating the mids. Fixed now. Also, I don't know why I used a 'ddply' statement to calculate the mids. That was silly. The new code is clearer and more concise.

Update 2: I returned to view a comment and noticed something slightly horrifying: I was using sums as the histogram frequencies. I have cleaned up the code a little and also added suggestions from the comments concerning the coloring syntax.

这篇关于使用ggplot2从已经汇总的计数中堆积的直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆