将R ggplot中的直方图中的y轴标准化为按比例分组 [英] Normalizing y-axis in histograms in R ggplot to proportion by group

查看:1159
本文介绍了将R ggplot中的直方图中的y轴标准化为按比例分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题与正常化y-除了我有两组不同大小的数据,我希望每个比例都是相对于它的组大小而不是总大小。



为了使它更清晰,假设我在数据框中有两组数据:

  dataA <-rnorm(100,3,sd = 2)
dataB <-rnorm(400,5,sd = 3)
all <-data.frame(dataset = c(rep('A ',length(dataA)),rep('B',length(dataB))),value = c(dataA,dataB))

我可以将这两个分布图一起打印出来:

  ggplot(all,aes(x =值,fill = dataset))+ geom_histogram(alpha = 0.5,position ='identity',binwidth = 0.5)

而不是在Y轴上的频率,我可以有以下比例:

  ggplot(所有,AES(X =值,填补=数据集))+ geom_histogram(AES(Y = ..计数../总和(..计数..)),α= 0.5,位置= '身份',binwidth = 0.5)

但是这给出了相对于总数据大小的比例(这里是500点):是的可能相对于每个组的大小吗?



我的目标是使得可以直观地比较A和B之间给定分箱中的值的比例,从他们各自的大小。不同于我原创的意见也是值得欢迎的!



谢谢!

像这样?



p>

  ggplot(all,aes(x = value,fill = dataset))+ 
geom_histogram(aes(y = 0.5 * ..density ..),
alpha = 0.5,position ='identity',binwidth = 0.5)

使用 y = .. density .. 缩放直方图,使每个下方的面积为1,或 sum(binwidth * y)= 1 因此,您可以使用 y = binwidth * .. density .. 来使y代表每个垃圾箱中总计的比例。在你的情况下, binwidth = 0.5



IMO更易于理解:





<$ (aes(x = value,fill = dataset))+
geom_histogram(aes(y = 0.5 * .. density ..),binwidth = 0.5)+ p $ p> ggplot
facet_wrap(〜dataset,nrow = 2)


My question is very similar to Normalizing y-axis in histograms in R ggplot to proportion, except that I have two groups of data of different size, and I would like that each proportion is relative to its group size instead of the total size.

To make it clearer, let's say I have two sets of data in a data frame:

dataA<-rnorm(100,3,sd=2)
dataB<-rnorm(400,5,sd=3)
all<-data.frame(dataset=c(rep('A',length(dataA)),rep('B',length(dataB))),value=c(dataA,dataB))

I can plot the two distributions together with:

ggplot(all,aes(x=value,fill=dataset))+geom_histogram(alpha=0.5,position='identity',binwidth=0.5)

and instead of the frequency on the Y axis I can have the proportion with:

ggplot(all,aes(x=value,fill=dataset))+geom_histogram(aes(y=..count../sum(..count..)),alpha=0.5,position='identity',binwidth=0.5)

But this gives the proportion relative to the total data size (500 points here): is it possible to have it relative to each group size?

My goal here is to make it possible to compare visually the proportion of values in a given bin between A and B, independently from their respective size. Ideas which differ from my original one are also welcome!

Thanks!

解决方案

Like this? [edited based on OP's comment]

ggplot(all,aes(x=value,fill=dataset))+
  geom_histogram(aes(y=0.5*..density..),
                 alpha=0.5,position='identity',binwidth=0.5)

Using y=..density.. scales the histograms so the area under each is 1, or sum(binwidth*y)=1. As a result, you would use y = binwidth*..density.. to have y represent the fraction of the total in each bin. In your case, binwidth=0.5.

IMO this is a little easier to interpret:

ggplot(all,aes(x=value,fill=dataset))+
  geom_histogram(aes(y=0.5*..density..),binwidth=0.5)+
  facet_wrap(~dataset,nrow=2)

这篇关于将R ggplot中的直方图中的y轴标准化为按比例分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆