R归一化然后在R中一起绘制两个直方图 [英] R Normalize then plot two histograms together in R

查看:219
本文介绍了R归一化然后在R中一起绘制两个直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我意识到有很多帖子让人们问如何将两个直方图并排绘制(如在一个条形图并排放置的情况下)并覆盖在R中,以及如何规范化数据.按照我发现的建议,我可以做一个或另一个,但不能两个都做.

I realize there have been several posts for people asking how to plot two histograms together side by side (as in one plot with the bars next to each other) and overlaid in R and also on how to normalize data. Following the advice that I've found, I'm able to do one or the other, but not both operations.

这是设置. 我有两个长度不同的数据框,并希望将每个df中的对象体积绘制为直方图.例如,数据帧1中有多少位于0.1到0.2 um ^ 3之间,并将其与数据帧2中有0.1到.2 um ^ 3之间的以此类推.重叠或并排会很好地做到这一点.

Here's the setup. I have two data frames of different lengths and would like to plot the volume of the objects in each df as a histogram. Eg how many in data frame 1 are between .1-.2 um^3 and compare it with how many in data frame 2 are between .1 and .2 um^3 and so on. Overlaid or Side by Side would be great to do this.

由于一个数据帧中的测量值比另一数据帧中的测量值多,显然我必须进行归一化,所以我使用:

Since there are more measurements in one data frame than the other, obviously I have to normalize, so I use:

read.csv(ctl)
read.csv(exp)
h1=hist(ctl$Volume....)
h2=hist(exp$Volume....

#to normalize#

h1$density=h1$counts/sum(h1$counts)*100
plot(h1,freq=FALSE....)
h2$density=h2$counts/sum(h2$counts)*100
plot(h2,freq=FALSE....)

现在,我已经使用此方法成功覆盖了非标准化数据: http://www.r-bloggers.com/overlapping-histogram-in-r/并使用此方法:

Now I've been successful overlaying the un-normalized data using this method: http://www.r-bloggers.com/overlapping-histogram-in-r/ and also with this method: plotting two histograms together

但是在覆盖标准化数据方面我很困

but I'm stuck when it comes to how to overlay normalized data

推荐答案

ggplot2使绘制大小不等的组的归一化直方图变得相对简单.这是伪造数据的示例:

ggplot2 makes it relatively straightforward to plot normalized histograms of groups with unequal size. Here's an example with fake data:

library(ggplot2)

# Fake data (two normal distributions)
set.seed(20)
dat1 = data.frame(x=rnorm(1000, 100, 10), group="A")
dat2 = data.frame(x=rnorm(2000, 120, 20), group="B")
dat = rbind(dat1, dat2)

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  ggtitle("Unormalized")

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(aes(y=..density..), breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  ggtitle("Normalized")

如果要制作叠加的密度图,也可以这样做. adjust控制带宽.默认情况下已将其标准化.

If you want to make overlayed density plots, you can do that as well. adjust controls the bandwidth. This is already normalized by default.

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_density(alpha=0.4, lwd=0.8, adjust=0.5) 

更新:为回答您的评论,应使用以下代码. (..density..)/sum(..density..)导致两个直方图的总密度相加为1,每个单独组的总密度相加为0.5.因此,您必须乘以2才能将每个组的总密度分别归一化为1.通常,必须乘以n,其中n是组数.这似乎有点糊涂,可能会有更优雅的方法.

UPDATE: In answer to your comment, the following code should do it. (..density..)/sum(..density..) results in the total density over the two histograms adding up to one, and the total density of each individual group adding up to 0.5. So you have multiply by 2 in order for the total density of each group to be individually normalized to 1. In general, you have to multiply by n, where n is the number of groups. This seems kind of kludgy and there may be a more elegant approach.

library(scales) # For percent_format()

ggplot(dat, aes(x, fill=group, colour=group)) +
  geom_histogram(aes(y=2*(..density..)/sum(..density..)), breaks=seq(0,200,5), alpha=0.6, 
                 position="identity", lwd=0.2) +
  scale_y_continuous(labels=percent_format())

这篇关于R归一化然后在R中一起绘制两个直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆